higher-performance r via c++ - part 1:...

59
Higher-Performance R via C++ Part 1: Introduction Dirk Eddelbuettel UZH/ETH Zürich R Courses June 24-25, 2015 0/58

Upload: others

Post on 08-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

Higher-Performance R via C++Part 1 Introduction

Dirk EddelbuettelUZHETH Zuumlrich R CoursesJune 24-25 2015

058

Overview

What Are We Doing Today and Tomorrow

High-level motivation Three main questions

middot Why Several reasons discussed nextmiddot How Rcpp details usage tips hellipmiddot What We will cover examples

258

Focus on R and C++

middot R Our starting pointmiddot C++ Our extension approachmiddot why how tricks hellip

358

Before the WhyHowWhat

Maybe some mutual introductions

middot Your background (academic industry hellip)middot R experience (beginner intermediate advanced hellip)middot Created modified any R packages middot C andor C++ experience middot Main interest in Rcpp speed extensions hellip middot Following rcpp-devel or r-devel

458

Overview Why R

Why R Programming with Data

ChambersComputationalMethods for DataAnalysis Wiley1977

Becker Chambersand Wilks TheNew S LanguageChapman amp Hall1988

Chambers andHastie StatisticalModels in SChapman amp Hall1992

ChambersProgramming withData Springer1998

ChambersSoftware for DataAnalysisProgramming withR Springer 2008

Thanks to John Chambers for sending me high-resolution scans of the covers of his books

658

A Simple Example

xx lt- faithful[eruptions]fit lt- density(xx)plot(fit)

758

A Simple Example

1 2 3 4 5 6

00

01

02

03

04

05

densitydefault(x = xx)

N = 272 Bandwidth = 03348

Den

sity

858

A Simple Example - Refined

xx lt- faithful[eruptions]fit1 lt- density(xx)fit2 lt- replicate(10000

x lt- sample(xxreplace=TRUE)density(x from=min(fit1$x) to=max(fit1$x))$y

)fit3 lt- apply(fit2 1 quantilec(00250975))plot(fit1 ylim=range(fit3))polygon(c(fit1$xrev(fit1$x)) c(fit3[1]rev(fit3[2]))

col=grey border=F)lines(fit1)

958

A Simple Example - Refined

1 2 3 4 5 6

00

01

02

03

04

05

densitydefault(x = xx)

N = 272 Bandwidth = 03348

Den

sity

1058

So Why R

R enables us to

middot work interactivelymiddot explore and visualize datamiddot access retrieve andor generate datamiddot summarize and report into pdf html hellip

making it the key language for statistical computing and apreferred environment for many data analysts

1158

So Why R

R has always been extensible via

middot C via a bare-bones interface described in Writing RExtensions

middot Fortran which is also used internally by Rmiddot Java via rJava by Simon Urbanekmiddot C++ but essentially at the bare-bones level of C

So while in theory this always worked ndash it was tedious inpractice

1258

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written in the Clanguage itrsquos not surprising that the most direct interfaceto non-R software is for code written in C or directlycallable from C All the same including additional C codeis a serious step with some added dangers and often asubstantial amount of programming and debuggingrequired You should have a good reason

1358

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written in the Clanguage itrsquos not surprising that the most direct interfaceto non-R software is for code written in C or directlycallable from C All the same including additional C codeis a serious step with some added dangers and often asubstantial amount of programming and debuggingrequired You should have a good reason

1458

Why Extend R

Chambers proceeds with this rough map of the road ahead

middot Againstmiddot Itrsquos more workmiddot Bugs will bitemiddot Potential platform dependencymiddot Less readable software

middot In Favormiddot New and trusted computationsmiddot Speedmiddot Object references

1558

Why Extend R

The Why boils down to

middot speed Often a good enough reason for us hellip and a focus forus in this workshop

middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R

middot references Chambers quote from 2008 foreshadowed thework on the new Reference Classes now in R and built uponvia Rcpp Modules Rcpp Classes (and also RcppR6)

1658

Overview Why C++

Why C++

middot Asking Google leads to about 52 million hitsmiddot Wikipedia C++ is a statically typed free-form

multi-paradigm compiled general-purpose powerfulprogramming language

middot C++ is industrial-strength vendor-independent widely-usedand still evolving

middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connectto it probably has a CC++ API

middot As a widely used language it also has good tool support(debuggers profilers code analysis)

1858

Why C++

Scott Meyers View C++ as a federation of languages

middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C

middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)

middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages

middot The Standard Template Library (STL) is a specific templatelibrary which is powerful but has its own conventions

middot C++11 (and C++14 and beyond) add enough to be calleda fifth language

NB Meyers original list of four languages appeared years before C++11

1958

Why C++

middot Mature yet currentmiddot Strong performance focus

middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the machine

level and C++

middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly shielded

2058

Overview Vision

Bell Labs May 1976

2258

Interface Vision

R offers us the best of both worlds

middot Compiled code withmiddot Access to proven libraries and algorithms in CC++Fortranmiddot Extremely high performance (in both serial and parallel modes)

middot Interpreted code withmiddot An accessible high-level language made for Programming with

Datamiddot An interactive workflow for data analysismiddot Support for rapid prototyping research and experimentation

2358

Why Rcpp

middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples

middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure

middot Expressive as it allows for vectorised C++ using Rcpp Sugarmiddot Seamless access to all R objects vector matrix list

S3S4RefClass Environment Function hellipmiddot Speed gains for a variety of tasks Rcpp excels precisely

where R struggles loops function calls hellipmiddot Extensions greatly facilitates access to external libraries

using eg Rcpp modules

2458

Overview Speed

Speed Example 1 (due to Christian Robert)

Five different ways to compute 1(1 + x)

f lt- function(n x=1) for(i in 1n) x lt- 1(1+x)g lt- function(n x=1) for(i in 1n) x lt- (1(1+x))h lt- function(n x=1) for(i in 1n) x lt- (1+x)^(-1)j lt- function(n x=1) for(i in 1n) x lt- 11+xk lt- function(n x=1) for(i in 1n) x lt- 11+xlibrary(rbenchmark)N lt- 1e5benchmark(f(N1)g(N1)h(N1)j(N1)k(N1)order=relative)[14]

2658

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 5 k(N 1) 100 6435 1000 1 f(N 1) 100 6609 1027 2 g(N 1) 100 7757 1205 4 j(N 1) 100 7882 1225 3 h(N 1) 100 11766 1828

2758

Speed Example 1 (due to Christian Robert)

Adding a C++ variant is easy

cppFunction(double m(int n double x)

for (int i=0 iltn i++)x = 1 (1+x)

return x

)

(We will learn more about cppFunction() later)

2858

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 6 m(N 1) 100 0170 1000 1 f(N 1) 100 6854 40318 5 k(N 1) 100 7811 45947 4 j(N 1) 100 9183 54018 2 g(N 1) 100 9489 55818 3 h(N 1) 100 11725 68971

2958

Speed Example 2 (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3058

Speed Example 2 (due to StackOverflow)

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3158

Speed Example 2 (due to StackOverflow)

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0030 1000 2 f(15) 100 0335 11167 3 f(20) 100 3517 117233

3258

Speed Example 2 (due to StackOverflow)

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

deployed as

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

3358

Speed Example 2 (due to StackOverflow)

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(20) g(20) order=relative)[14]

test replications elapsed relative 2 g(20) 100 0011 1000 1 f(20) 100 3776 343273

A nice gain of a few orders of magnitude3458

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

3558

What Next

Hands-on

Programming with C++

middot C++ Basicsmiddot Debuggingmiddot Best Practices

and then on to Rcpp itself

3758

C++ Basics

Compiled not Interpreted

Need to compile and link

include ltcstdiogt

int main(void) printf(Hello worldn)return 0

3958

Compiled not Interpreted

Or streams output rather than printf

include ltiostreamgt

int main(void) stdcout ltlt Hello world ltlt stdendlreturn 0

4058

Compiled not Interpreted

g++ -o will compile and link

We will now look at an examples with explicit linking

4158

Compiled not Interpreted

include ltcstdiogt

define MATHLIB_STANDALONEinclude ltRmathhgt

int main(void) printf(N(01) 95th percentile 98fn

qnorm(095 00 10 1 0))

4258

Compiled not Interpreted

We may need to supply

middot header location via -Imiddot library location via -Lmiddot library via -llibraryname

g++ -Iusrinclude -c qnorm_rmathcppg++ -o qnorm_rmath qnorm_rmatho -Lusrlib -lRmath

4358

Statically Typed

middot R is dynamically typed x lt- 314 x lt- foo is validmiddot In C++ each variable must be declared before first usemiddot Common types are int and long (possibly with unsigned)float and double bool as well as char

middot No standard string type though stdstring is closemiddot All these variables types are scalars which is fundamentally

different from R where everything is a vectormiddot class (and struct) allow creation of composite types

classes add behaviour to data to form objectsmiddot Variables need to be declared cannot change

4458

C++ Basics A Better C

C++ is a Better C

middot control structures similar to what R offers for while ifswitch

middot functions are similar too but note the difference inpositional-only matching also same function name butdifferent arguments allowed in C++

middot pointers and memory management very different but lotsof issues people had with C can be avoided via STL (whichis something Rcpp promotes too)

middot sometimes still useful to know what a pointer is hellip

4658

Object-Oriented

This is a second key feature of C++ and it does it differentlyfrom S3 and S4

struct Date unsigned int yearunsigned int monthunsigned int day

struct Person char firstname[20]char lastname[20]struct Date birthdayunsigned long id

4758

Object-Oriented

Object-orientation in the C++ sense matches data with codeoperating on it

class Date private

unsigned int yearunsigned int monthunsigned int date

publicvoid setDate(int y int m int d)int getDay()int getMonth()int getYear()

4858

Generic Programming and the STL

The STL promotes generic programming

For example the sequence container types vector dequeand list all support

middot push_back() to insert at the endmiddot pop_back() to remove from the frontmiddot begin() returning an iterator to the first elementmiddot end() returning an iterator to just after the last elementmiddot size() for the number of elements

but only list has push_front() and pop_front()

Other useful containers set multiset map and multimap4958

Generic Programming and the STL

Traversal of containers can be achieved via iterators whichrequire suitable member functions begin() and end()

stdvectorltdoublegtconst_iterator sifor (si=sbegin() si = send() si++)

stdcout ltlt si ltlt stdendl

5058

Generic Programming and the STL

Another key STL part are algorithms

double sum = accumulate(sbegin() send() 0)

Some other STL algorithms are

middot find finds the first element equal to the supplied valuemiddot count counts the number of matching elementsmiddot transform applies a supplied function to each elementmiddot for_each sweeps over all elements does not altermiddot inner_product inner product of two vectors

5158

Template Programming

Template programming provides a lsquolanguage within C++rsquocode gets evaluated during compilation

One of the simplest template examples is

template lttypename Tgtconst Tamp min(const Tamp x const Tamp y)

return y lt x y x

This can now be used to compute the minimum between twoint variables or double or in fact any admissible typeproviding an operatorlt() for less-than comparison

5258

Template Programming

Another template example is a class squaring its argument

template lttypename Tgtclass square public stdunary_functionltTTgt public

T operator()(T t) const return tt

which can be used along with STL algorithms

transform(xbegin() xend() square)

5358

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 2: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

Overview

What Are We Doing Today and Tomorrow

High-level motivation Three main questions

middot Why Several reasons discussed nextmiddot How Rcpp details usage tips hellipmiddot What We will cover examples

258

Focus on R and C++

middot R Our starting pointmiddot C++ Our extension approachmiddot why how tricks hellip

358

Before the WhyHowWhat

Maybe some mutual introductions

middot Your background (academic industry hellip)middot R experience (beginner intermediate advanced hellip)middot Created modified any R packages middot C andor C++ experience middot Main interest in Rcpp speed extensions hellip middot Following rcpp-devel or r-devel

458

Overview Why R

Why R Programming with Data

ChambersComputationalMethods for DataAnalysis Wiley1977

Becker Chambersand Wilks TheNew S LanguageChapman amp Hall1988

Chambers andHastie StatisticalModels in SChapman amp Hall1992

ChambersProgramming withData Springer1998

ChambersSoftware for DataAnalysisProgramming withR Springer 2008

Thanks to John Chambers for sending me high-resolution scans of the covers of his books

658

A Simple Example

xx lt- faithful[eruptions]fit lt- density(xx)plot(fit)

758

A Simple Example

1 2 3 4 5 6

00

01

02

03

04

05

densitydefault(x = xx)

N = 272 Bandwidth = 03348

Den

sity

858

A Simple Example - Refined

xx lt- faithful[eruptions]fit1 lt- density(xx)fit2 lt- replicate(10000

x lt- sample(xxreplace=TRUE)density(x from=min(fit1$x) to=max(fit1$x))$y

)fit3 lt- apply(fit2 1 quantilec(00250975))plot(fit1 ylim=range(fit3))polygon(c(fit1$xrev(fit1$x)) c(fit3[1]rev(fit3[2]))

col=grey border=F)lines(fit1)

958

A Simple Example - Refined

1 2 3 4 5 6

00

01

02

03

04

05

densitydefault(x = xx)

N = 272 Bandwidth = 03348

Den

sity

1058

So Why R

R enables us to

middot work interactivelymiddot explore and visualize datamiddot access retrieve andor generate datamiddot summarize and report into pdf html hellip

making it the key language for statistical computing and apreferred environment for many data analysts

1158

So Why R

R has always been extensible via

middot C via a bare-bones interface described in Writing RExtensions

middot Fortran which is also used internally by Rmiddot Java via rJava by Simon Urbanekmiddot C++ but essentially at the bare-bones level of C

So while in theory this always worked ndash it was tedious inpractice

1258

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written in the Clanguage itrsquos not surprising that the most direct interfaceto non-R software is for code written in C or directlycallable from C All the same including additional C codeis a serious step with some added dangers and often asubstantial amount of programming and debuggingrequired You should have a good reason

1358

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written in the Clanguage itrsquos not surprising that the most direct interfaceto non-R software is for code written in C or directlycallable from C All the same including additional C codeis a serious step with some added dangers and often asubstantial amount of programming and debuggingrequired You should have a good reason

1458

Why Extend R

Chambers proceeds with this rough map of the road ahead

middot Againstmiddot Itrsquos more workmiddot Bugs will bitemiddot Potential platform dependencymiddot Less readable software

middot In Favormiddot New and trusted computationsmiddot Speedmiddot Object references

1558

Why Extend R

The Why boils down to

middot speed Often a good enough reason for us hellip and a focus forus in this workshop

middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R

middot references Chambers quote from 2008 foreshadowed thework on the new Reference Classes now in R and built uponvia Rcpp Modules Rcpp Classes (and also RcppR6)

1658

Overview Why C++

Why C++

middot Asking Google leads to about 52 million hitsmiddot Wikipedia C++ is a statically typed free-form

multi-paradigm compiled general-purpose powerfulprogramming language

middot C++ is industrial-strength vendor-independent widely-usedand still evolving

middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connectto it probably has a CC++ API

middot As a widely used language it also has good tool support(debuggers profilers code analysis)

1858

Why C++

Scott Meyers View C++ as a federation of languages

middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C

middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)

middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages

middot The Standard Template Library (STL) is a specific templatelibrary which is powerful but has its own conventions

middot C++11 (and C++14 and beyond) add enough to be calleda fifth language

NB Meyers original list of four languages appeared years before C++11

1958

Why C++

middot Mature yet currentmiddot Strong performance focus

middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the machine

level and C++

middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly shielded

2058

Overview Vision

Bell Labs May 1976

2258

Interface Vision

R offers us the best of both worlds

middot Compiled code withmiddot Access to proven libraries and algorithms in CC++Fortranmiddot Extremely high performance (in both serial and parallel modes)

middot Interpreted code withmiddot An accessible high-level language made for Programming with

Datamiddot An interactive workflow for data analysismiddot Support for rapid prototyping research and experimentation

2358

Why Rcpp

middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples

middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure

middot Expressive as it allows for vectorised C++ using Rcpp Sugarmiddot Seamless access to all R objects vector matrix list

S3S4RefClass Environment Function hellipmiddot Speed gains for a variety of tasks Rcpp excels precisely

where R struggles loops function calls hellipmiddot Extensions greatly facilitates access to external libraries

using eg Rcpp modules

2458

Overview Speed

Speed Example 1 (due to Christian Robert)

Five different ways to compute 1(1 + x)

f lt- function(n x=1) for(i in 1n) x lt- 1(1+x)g lt- function(n x=1) for(i in 1n) x lt- (1(1+x))h lt- function(n x=1) for(i in 1n) x lt- (1+x)^(-1)j lt- function(n x=1) for(i in 1n) x lt- 11+xk lt- function(n x=1) for(i in 1n) x lt- 11+xlibrary(rbenchmark)N lt- 1e5benchmark(f(N1)g(N1)h(N1)j(N1)k(N1)order=relative)[14]

2658

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 5 k(N 1) 100 6435 1000 1 f(N 1) 100 6609 1027 2 g(N 1) 100 7757 1205 4 j(N 1) 100 7882 1225 3 h(N 1) 100 11766 1828

2758

Speed Example 1 (due to Christian Robert)

Adding a C++ variant is easy

cppFunction(double m(int n double x)

for (int i=0 iltn i++)x = 1 (1+x)

return x

)

(We will learn more about cppFunction() later)

2858

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 6 m(N 1) 100 0170 1000 1 f(N 1) 100 6854 40318 5 k(N 1) 100 7811 45947 4 j(N 1) 100 9183 54018 2 g(N 1) 100 9489 55818 3 h(N 1) 100 11725 68971

2958

Speed Example 2 (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3058

Speed Example 2 (due to StackOverflow)

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3158

Speed Example 2 (due to StackOverflow)

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0030 1000 2 f(15) 100 0335 11167 3 f(20) 100 3517 117233

3258

Speed Example 2 (due to StackOverflow)

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

deployed as

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

3358

Speed Example 2 (due to StackOverflow)

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(20) g(20) order=relative)[14]

test replications elapsed relative 2 g(20) 100 0011 1000 1 f(20) 100 3776 343273

A nice gain of a few orders of magnitude3458

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

3558

What Next

Hands-on

Programming with C++

middot C++ Basicsmiddot Debuggingmiddot Best Practices

and then on to Rcpp itself

3758

C++ Basics

Compiled not Interpreted

Need to compile and link

include ltcstdiogt

int main(void) printf(Hello worldn)return 0

3958

Compiled not Interpreted

Or streams output rather than printf

include ltiostreamgt

int main(void) stdcout ltlt Hello world ltlt stdendlreturn 0

4058

Compiled not Interpreted

g++ -o will compile and link

We will now look at an examples with explicit linking

4158

Compiled not Interpreted

include ltcstdiogt

define MATHLIB_STANDALONEinclude ltRmathhgt

int main(void) printf(N(01) 95th percentile 98fn

qnorm(095 00 10 1 0))

4258

Compiled not Interpreted

We may need to supply

middot header location via -Imiddot library location via -Lmiddot library via -llibraryname

g++ -Iusrinclude -c qnorm_rmathcppg++ -o qnorm_rmath qnorm_rmatho -Lusrlib -lRmath

4358

Statically Typed

middot R is dynamically typed x lt- 314 x lt- foo is validmiddot In C++ each variable must be declared before first usemiddot Common types are int and long (possibly with unsigned)float and double bool as well as char

middot No standard string type though stdstring is closemiddot All these variables types are scalars which is fundamentally

different from R where everything is a vectormiddot class (and struct) allow creation of composite types

classes add behaviour to data to form objectsmiddot Variables need to be declared cannot change

4458

C++ Basics A Better C

C++ is a Better C

middot control structures similar to what R offers for while ifswitch

middot functions are similar too but note the difference inpositional-only matching also same function name butdifferent arguments allowed in C++

middot pointers and memory management very different but lotsof issues people had with C can be avoided via STL (whichis something Rcpp promotes too)

middot sometimes still useful to know what a pointer is hellip

4658

Object-Oriented

This is a second key feature of C++ and it does it differentlyfrom S3 and S4

struct Date unsigned int yearunsigned int monthunsigned int day

struct Person char firstname[20]char lastname[20]struct Date birthdayunsigned long id

4758

Object-Oriented

Object-orientation in the C++ sense matches data with codeoperating on it

class Date private

unsigned int yearunsigned int monthunsigned int date

publicvoid setDate(int y int m int d)int getDay()int getMonth()int getYear()

4858

Generic Programming and the STL

The STL promotes generic programming

For example the sequence container types vector dequeand list all support

middot push_back() to insert at the endmiddot pop_back() to remove from the frontmiddot begin() returning an iterator to the first elementmiddot end() returning an iterator to just after the last elementmiddot size() for the number of elements

but only list has push_front() and pop_front()

Other useful containers set multiset map and multimap4958

Generic Programming and the STL

Traversal of containers can be achieved via iterators whichrequire suitable member functions begin() and end()

stdvectorltdoublegtconst_iterator sifor (si=sbegin() si = send() si++)

stdcout ltlt si ltlt stdendl

5058

Generic Programming and the STL

Another key STL part are algorithms

double sum = accumulate(sbegin() send() 0)

Some other STL algorithms are

middot find finds the first element equal to the supplied valuemiddot count counts the number of matching elementsmiddot transform applies a supplied function to each elementmiddot for_each sweeps over all elements does not altermiddot inner_product inner product of two vectors

5158

Template Programming

Template programming provides a lsquolanguage within C++rsquocode gets evaluated during compilation

One of the simplest template examples is

template lttypename Tgtconst Tamp min(const Tamp x const Tamp y)

return y lt x y x

This can now be used to compute the minimum between twoint variables or double or in fact any admissible typeproviding an operatorlt() for less-than comparison

5258

Template Programming

Another template example is a class squaring its argument

template lttypename Tgtclass square public stdunary_functionltTTgt public

T operator()(T t) const return tt

which can be used along with STL algorithms

transform(xbegin() xend() square)

5358

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 3: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

What Are We Doing Today and Tomorrow

High-level motivation Three main questions

middot Why Several reasons discussed nextmiddot How Rcpp details usage tips hellipmiddot What We will cover examples

258

Focus on R and C++

middot R Our starting pointmiddot C++ Our extension approachmiddot why how tricks hellip

358

Before the WhyHowWhat

Maybe some mutual introductions

middot Your background (academic industry hellip)middot R experience (beginner intermediate advanced hellip)middot Created modified any R packages middot C andor C++ experience middot Main interest in Rcpp speed extensions hellip middot Following rcpp-devel or r-devel

458

Overview Why R

Why R Programming with Data

ChambersComputationalMethods for DataAnalysis Wiley1977

Becker Chambersand Wilks TheNew S LanguageChapman amp Hall1988

Chambers andHastie StatisticalModels in SChapman amp Hall1992

ChambersProgramming withData Springer1998

ChambersSoftware for DataAnalysisProgramming withR Springer 2008

Thanks to John Chambers for sending me high-resolution scans of the covers of his books

658

A Simple Example

xx lt- faithful[eruptions]fit lt- density(xx)plot(fit)

758

A Simple Example

1 2 3 4 5 6

00

01

02

03

04

05

densitydefault(x = xx)

N = 272 Bandwidth = 03348

Den

sity

858

A Simple Example - Refined

xx lt- faithful[eruptions]fit1 lt- density(xx)fit2 lt- replicate(10000

x lt- sample(xxreplace=TRUE)density(x from=min(fit1$x) to=max(fit1$x))$y

)fit3 lt- apply(fit2 1 quantilec(00250975))plot(fit1 ylim=range(fit3))polygon(c(fit1$xrev(fit1$x)) c(fit3[1]rev(fit3[2]))

col=grey border=F)lines(fit1)

958

A Simple Example - Refined

1 2 3 4 5 6

00

01

02

03

04

05

densitydefault(x = xx)

N = 272 Bandwidth = 03348

Den

sity

1058

So Why R

R enables us to

middot work interactivelymiddot explore and visualize datamiddot access retrieve andor generate datamiddot summarize and report into pdf html hellip

making it the key language for statistical computing and apreferred environment for many data analysts

1158

So Why R

R has always been extensible via

middot C via a bare-bones interface described in Writing RExtensions

middot Fortran which is also used internally by Rmiddot Java via rJava by Simon Urbanekmiddot C++ but essentially at the bare-bones level of C

So while in theory this always worked ndash it was tedious inpractice

1258

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written in the Clanguage itrsquos not surprising that the most direct interfaceto non-R software is for code written in C or directlycallable from C All the same including additional C codeis a serious step with some added dangers and often asubstantial amount of programming and debuggingrequired You should have a good reason

1358

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written in the Clanguage itrsquos not surprising that the most direct interfaceto non-R software is for code written in C or directlycallable from C All the same including additional C codeis a serious step with some added dangers and often asubstantial amount of programming and debuggingrequired You should have a good reason

1458

Why Extend R

Chambers proceeds with this rough map of the road ahead

middot Againstmiddot Itrsquos more workmiddot Bugs will bitemiddot Potential platform dependencymiddot Less readable software

middot In Favormiddot New and trusted computationsmiddot Speedmiddot Object references

1558

Why Extend R

The Why boils down to

middot speed Often a good enough reason for us hellip and a focus forus in this workshop

middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R

middot references Chambers quote from 2008 foreshadowed thework on the new Reference Classes now in R and built uponvia Rcpp Modules Rcpp Classes (and also RcppR6)

1658

Overview Why C++

Why C++

middot Asking Google leads to about 52 million hitsmiddot Wikipedia C++ is a statically typed free-form

multi-paradigm compiled general-purpose powerfulprogramming language

middot C++ is industrial-strength vendor-independent widely-usedand still evolving

middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connectto it probably has a CC++ API

middot As a widely used language it also has good tool support(debuggers profilers code analysis)

1858

Why C++

Scott Meyers View C++ as a federation of languages

middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C

middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)

middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages

middot The Standard Template Library (STL) is a specific templatelibrary which is powerful but has its own conventions

middot C++11 (and C++14 and beyond) add enough to be calleda fifth language

NB Meyers original list of four languages appeared years before C++11

1958

Why C++

middot Mature yet currentmiddot Strong performance focus

middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the machine

level and C++

middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly shielded

2058

Overview Vision

Bell Labs May 1976

2258

Interface Vision

R offers us the best of both worlds

middot Compiled code withmiddot Access to proven libraries and algorithms in CC++Fortranmiddot Extremely high performance (in both serial and parallel modes)

middot Interpreted code withmiddot An accessible high-level language made for Programming with

Datamiddot An interactive workflow for data analysismiddot Support for rapid prototyping research and experimentation

2358

Why Rcpp

middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples

middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure

middot Expressive as it allows for vectorised C++ using Rcpp Sugarmiddot Seamless access to all R objects vector matrix list

S3S4RefClass Environment Function hellipmiddot Speed gains for a variety of tasks Rcpp excels precisely

where R struggles loops function calls hellipmiddot Extensions greatly facilitates access to external libraries

using eg Rcpp modules

2458

Overview Speed

Speed Example 1 (due to Christian Robert)

Five different ways to compute 1(1 + x)

f lt- function(n x=1) for(i in 1n) x lt- 1(1+x)g lt- function(n x=1) for(i in 1n) x lt- (1(1+x))h lt- function(n x=1) for(i in 1n) x lt- (1+x)^(-1)j lt- function(n x=1) for(i in 1n) x lt- 11+xk lt- function(n x=1) for(i in 1n) x lt- 11+xlibrary(rbenchmark)N lt- 1e5benchmark(f(N1)g(N1)h(N1)j(N1)k(N1)order=relative)[14]

2658

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 5 k(N 1) 100 6435 1000 1 f(N 1) 100 6609 1027 2 g(N 1) 100 7757 1205 4 j(N 1) 100 7882 1225 3 h(N 1) 100 11766 1828

2758

Speed Example 1 (due to Christian Robert)

Adding a C++ variant is easy

cppFunction(double m(int n double x)

for (int i=0 iltn i++)x = 1 (1+x)

return x

)

(We will learn more about cppFunction() later)

2858

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 6 m(N 1) 100 0170 1000 1 f(N 1) 100 6854 40318 5 k(N 1) 100 7811 45947 4 j(N 1) 100 9183 54018 2 g(N 1) 100 9489 55818 3 h(N 1) 100 11725 68971

2958

Speed Example 2 (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3058

Speed Example 2 (due to StackOverflow)

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3158

Speed Example 2 (due to StackOverflow)

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0030 1000 2 f(15) 100 0335 11167 3 f(20) 100 3517 117233

3258

Speed Example 2 (due to StackOverflow)

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

deployed as

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

3358

Speed Example 2 (due to StackOverflow)

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(20) g(20) order=relative)[14]

test replications elapsed relative 2 g(20) 100 0011 1000 1 f(20) 100 3776 343273

A nice gain of a few orders of magnitude3458

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

3558

What Next

Hands-on

Programming with C++

middot C++ Basicsmiddot Debuggingmiddot Best Practices

and then on to Rcpp itself

3758

C++ Basics

Compiled not Interpreted

Need to compile and link

include ltcstdiogt

int main(void) printf(Hello worldn)return 0

3958

Compiled not Interpreted

Or streams output rather than printf

include ltiostreamgt

int main(void) stdcout ltlt Hello world ltlt stdendlreturn 0

4058

Compiled not Interpreted

g++ -o will compile and link

We will now look at an examples with explicit linking

4158

Compiled not Interpreted

include ltcstdiogt

define MATHLIB_STANDALONEinclude ltRmathhgt

int main(void) printf(N(01) 95th percentile 98fn

qnorm(095 00 10 1 0))

4258

Compiled not Interpreted

We may need to supply

middot header location via -Imiddot library location via -Lmiddot library via -llibraryname

g++ -Iusrinclude -c qnorm_rmathcppg++ -o qnorm_rmath qnorm_rmatho -Lusrlib -lRmath

4358

Statically Typed

middot R is dynamically typed x lt- 314 x lt- foo is validmiddot In C++ each variable must be declared before first usemiddot Common types are int and long (possibly with unsigned)float and double bool as well as char

middot No standard string type though stdstring is closemiddot All these variables types are scalars which is fundamentally

different from R where everything is a vectormiddot class (and struct) allow creation of composite types

classes add behaviour to data to form objectsmiddot Variables need to be declared cannot change

4458

C++ Basics A Better C

C++ is a Better C

middot control structures similar to what R offers for while ifswitch

middot functions are similar too but note the difference inpositional-only matching also same function name butdifferent arguments allowed in C++

middot pointers and memory management very different but lotsof issues people had with C can be avoided via STL (whichis something Rcpp promotes too)

middot sometimes still useful to know what a pointer is hellip

4658

Object-Oriented

This is a second key feature of C++ and it does it differentlyfrom S3 and S4

struct Date unsigned int yearunsigned int monthunsigned int day

struct Person char firstname[20]char lastname[20]struct Date birthdayunsigned long id

4758

Object-Oriented

Object-orientation in the C++ sense matches data with codeoperating on it

class Date private

unsigned int yearunsigned int monthunsigned int date

publicvoid setDate(int y int m int d)int getDay()int getMonth()int getYear()

4858

Generic Programming and the STL

The STL promotes generic programming

For example the sequence container types vector dequeand list all support

middot push_back() to insert at the endmiddot pop_back() to remove from the frontmiddot begin() returning an iterator to the first elementmiddot end() returning an iterator to just after the last elementmiddot size() for the number of elements

but only list has push_front() and pop_front()

Other useful containers set multiset map and multimap4958

Generic Programming and the STL

Traversal of containers can be achieved via iterators whichrequire suitable member functions begin() and end()

stdvectorltdoublegtconst_iterator sifor (si=sbegin() si = send() si++)

stdcout ltlt si ltlt stdendl

5058

Generic Programming and the STL

Another key STL part are algorithms

double sum = accumulate(sbegin() send() 0)

Some other STL algorithms are

middot find finds the first element equal to the supplied valuemiddot count counts the number of matching elementsmiddot transform applies a supplied function to each elementmiddot for_each sweeps over all elements does not altermiddot inner_product inner product of two vectors

5158

Template Programming

Template programming provides a lsquolanguage within C++rsquocode gets evaluated during compilation

One of the simplest template examples is

template lttypename Tgtconst Tamp min(const Tamp x const Tamp y)

return y lt x y x

This can now be used to compute the minimum between twoint variables or double or in fact any admissible typeproviding an operatorlt() for less-than comparison

5258

Template Programming

Another template example is a class squaring its argument

template lttypename Tgtclass square public stdunary_functionltTTgt public

T operator()(T t) const return tt

which can be used along with STL algorithms

transform(xbegin() xend() square)

5358

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 4: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

Focus on R and C++

middot R Our starting pointmiddot C++ Our extension approachmiddot why how tricks hellip

358

Before the WhyHowWhat

Maybe some mutual introductions

middot Your background (academic industry hellip)middot R experience (beginner intermediate advanced hellip)middot Created modified any R packages middot C andor C++ experience middot Main interest in Rcpp speed extensions hellip middot Following rcpp-devel or r-devel

458

Overview Why R

Why R Programming with Data

ChambersComputationalMethods for DataAnalysis Wiley1977

Becker Chambersand Wilks TheNew S LanguageChapman amp Hall1988

Chambers andHastie StatisticalModels in SChapman amp Hall1992

ChambersProgramming withData Springer1998

ChambersSoftware for DataAnalysisProgramming withR Springer 2008

Thanks to John Chambers for sending me high-resolution scans of the covers of his books

658

A Simple Example

xx lt- faithful[eruptions]fit lt- density(xx)plot(fit)

758

A Simple Example

1 2 3 4 5 6

00

01

02

03

04

05

densitydefault(x = xx)

N = 272 Bandwidth = 03348

Den

sity

858

A Simple Example - Refined

xx lt- faithful[eruptions]fit1 lt- density(xx)fit2 lt- replicate(10000

x lt- sample(xxreplace=TRUE)density(x from=min(fit1$x) to=max(fit1$x))$y

)fit3 lt- apply(fit2 1 quantilec(00250975))plot(fit1 ylim=range(fit3))polygon(c(fit1$xrev(fit1$x)) c(fit3[1]rev(fit3[2]))

col=grey border=F)lines(fit1)

958

A Simple Example - Refined

1 2 3 4 5 6

00

01

02

03

04

05

densitydefault(x = xx)

N = 272 Bandwidth = 03348

Den

sity

1058

So Why R

R enables us to

middot work interactivelymiddot explore and visualize datamiddot access retrieve andor generate datamiddot summarize and report into pdf html hellip

making it the key language for statistical computing and apreferred environment for many data analysts

1158

So Why R

R has always been extensible via

middot C via a bare-bones interface described in Writing RExtensions

middot Fortran which is also used internally by Rmiddot Java via rJava by Simon Urbanekmiddot C++ but essentially at the bare-bones level of C

So while in theory this always worked ndash it was tedious inpractice

1258

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written in the Clanguage itrsquos not surprising that the most direct interfaceto non-R software is for code written in C or directlycallable from C All the same including additional C codeis a serious step with some added dangers and often asubstantial amount of programming and debuggingrequired You should have a good reason

1358

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written in the Clanguage itrsquos not surprising that the most direct interfaceto non-R software is for code written in C or directlycallable from C All the same including additional C codeis a serious step with some added dangers and often asubstantial amount of programming and debuggingrequired You should have a good reason

1458

Why Extend R

Chambers proceeds with this rough map of the road ahead

middot Againstmiddot Itrsquos more workmiddot Bugs will bitemiddot Potential platform dependencymiddot Less readable software

middot In Favormiddot New and trusted computationsmiddot Speedmiddot Object references

1558

Why Extend R

The Why boils down to

middot speed Often a good enough reason for us hellip and a focus forus in this workshop

middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R

middot references Chambers quote from 2008 foreshadowed thework on the new Reference Classes now in R and built uponvia Rcpp Modules Rcpp Classes (and also RcppR6)

1658

Overview Why C++

Why C++

middot Asking Google leads to about 52 million hitsmiddot Wikipedia C++ is a statically typed free-form

multi-paradigm compiled general-purpose powerfulprogramming language

middot C++ is industrial-strength vendor-independent widely-usedand still evolving

middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connectto it probably has a CC++ API

middot As a widely used language it also has good tool support(debuggers profilers code analysis)

1858

Why C++

Scott Meyers View C++ as a federation of languages

middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C

middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)

middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages

middot The Standard Template Library (STL) is a specific templatelibrary which is powerful but has its own conventions

middot C++11 (and C++14 and beyond) add enough to be calleda fifth language

NB Meyers original list of four languages appeared years before C++11

1958

Why C++

middot Mature yet currentmiddot Strong performance focus

middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the machine

level and C++

middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly shielded

2058

Overview Vision

Bell Labs May 1976

2258

Interface Vision

R offers us the best of both worlds

middot Compiled code withmiddot Access to proven libraries and algorithms in CC++Fortranmiddot Extremely high performance (in both serial and parallel modes)

middot Interpreted code withmiddot An accessible high-level language made for Programming with

Datamiddot An interactive workflow for data analysismiddot Support for rapid prototyping research and experimentation

2358

Why Rcpp

middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples

middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure

middot Expressive as it allows for vectorised C++ using Rcpp Sugarmiddot Seamless access to all R objects vector matrix list

S3S4RefClass Environment Function hellipmiddot Speed gains for a variety of tasks Rcpp excels precisely

where R struggles loops function calls hellipmiddot Extensions greatly facilitates access to external libraries

using eg Rcpp modules

2458

Overview Speed

Speed Example 1 (due to Christian Robert)

Five different ways to compute 1(1 + x)

f lt- function(n x=1) for(i in 1n) x lt- 1(1+x)g lt- function(n x=1) for(i in 1n) x lt- (1(1+x))h lt- function(n x=1) for(i in 1n) x lt- (1+x)^(-1)j lt- function(n x=1) for(i in 1n) x lt- 11+xk lt- function(n x=1) for(i in 1n) x lt- 11+xlibrary(rbenchmark)N lt- 1e5benchmark(f(N1)g(N1)h(N1)j(N1)k(N1)order=relative)[14]

2658

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 5 k(N 1) 100 6435 1000 1 f(N 1) 100 6609 1027 2 g(N 1) 100 7757 1205 4 j(N 1) 100 7882 1225 3 h(N 1) 100 11766 1828

2758

Speed Example 1 (due to Christian Robert)

Adding a C++ variant is easy

cppFunction(double m(int n double x)

for (int i=0 iltn i++)x = 1 (1+x)

return x

)

(We will learn more about cppFunction() later)

2858

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 6 m(N 1) 100 0170 1000 1 f(N 1) 100 6854 40318 5 k(N 1) 100 7811 45947 4 j(N 1) 100 9183 54018 2 g(N 1) 100 9489 55818 3 h(N 1) 100 11725 68971

2958

Speed Example 2 (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3058

Speed Example 2 (due to StackOverflow)

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3158

Speed Example 2 (due to StackOverflow)

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0030 1000 2 f(15) 100 0335 11167 3 f(20) 100 3517 117233

3258

Speed Example 2 (due to StackOverflow)

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

deployed as

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

3358

Speed Example 2 (due to StackOverflow)

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(20) g(20) order=relative)[14]

test replications elapsed relative 2 g(20) 100 0011 1000 1 f(20) 100 3776 343273

A nice gain of a few orders of magnitude3458

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

3558

What Next

Hands-on

Programming with C++

middot C++ Basicsmiddot Debuggingmiddot Best Practices

and then on to Rcpp itself

3758

C++ Basics

Compiled not Interpreted

Need to compile and link

include ltcstdiogt

int main(void) printf(Hello worldn)return 0

3958

Compiled not Interpreted

Or streams output rather than printf

include ltiostreamgt

int main(void) stdcout ltlt Hello world ltlt stdendlreturn 0

4058

Compiled not Interpreted

g++ -o will compile and link

We will now look at an examples with explicit linking

4158

Compiled not Interpreted

include ltcstdiogt

define MATHLIB_STANDALONEinclude ltRmathhgt

int main(void) printf(N(01) 95th percentile 98fn

qnorm(095 00 10 1 0))

4258

Compiled not Interpreted

We may need to supply

middot header location via -Imiddot library location via -Lmiddot library via -llibraryname

g++ -Iusrinclude -c qnorm_rmathcppg++ -o qnorm_rmath qnorm_rmatho -Lusrlib -lRmath

4358

Statically Typed

middot R is dynamically typed x lt- 314 x lt- foo is validmiddot In C++ each variable must be declared before first usemiddot Common types are int and long (possibly with unsigned)float and double bool as well as char

middot No standard string type though stdstring is closemiddot All these variables types are scalars which is fundamentally

different from R where everything is a vectormiddot class (and struct) allow creation of composite types

classes add behaviour to data to form objectsmiddot Variables need to be declared cannot change

4458

C++ Basics A Better C

C++ is a Better C

middot control structures similar to what R offers for while ifswitch

middot functions are similar too but note the difference inpositional-only matching also same function name butdifferent arguments allowed in C++

middot pointers and memory management very different but lotsof issues people had with C can be avoided via STL (whichis something Rcpp promotes too)

middot sometimes still useful to know what a pointer is hellip

4658

Object-Oriented

This is a second key feature of C++ and it does it differentlyfrom S3 and S4

struct Date unsigned int yearunsigned int monthunsigned int day

struct Person char firstname[20]char lastname[20]struct Date birthdayunsigned long id

4758

Object-Oriented

Object-orientation in the C++ sense matches data with codeoperating on it

class Date private

unsigned int yearunsigned int monthunsigned int date

publicvoid setDate(int y int m int d)int getDay()int getMonth()int getYear()

4858

Generic Programming and the STL

The STL promotes generic programming

For example the sequence container types vector dequeand list all support

middot push_back() to insert at the endmiddot pop_back() to remove from the frontmiddot begin() returning an iterator to the first elementmiddot end() returning an iterator to just after the last elementmiddot size() for the number of elements

but only list has push_front() and pop_front()

Other useful containers set multiset map and multimap4958

Generic Programming and the STL

Traversal of containers can be achieved via iterators whichrequire suitable member functions begin() and end()

stdvectorltdoublegtconst_iterator sifor (si=sbegin() si = send() si++)

stdcout ltlt si ltlt stdendl

5058

Generic Programming and the STL

Another key STL part are algorithms

double sum = accumulate(sbegin() send() 0)

Some other STL algorithms are

middot find finds the first element equal to the supplied valuemiddot count counts the number of matching elementsmiddot transform applies a supplied function to each elementmiddot for_each sweeps over all elements does not altermiddot inner_product inner product of two vectors

5158

Template Programming

Template programming provides a lsquolanguage within C++rsquocode gets evaluated during compilation

One of the simplest template examples is

template lttypename Tgtconst Tamp min(const Tamp x const Tamp y)

return y lt x y x

This can now be used to compute the minimum between twoint variables or double or in fact any admissible typeproviding an operatorlt() for less-than comparison

5258

Template Programming

Another template example is a class squaring its argument

template lttypename Tgtclass square public stdunary_functionltTTgt public

T operator()(T t) const return tt

which can be used along with STL algorithms

transform(xbegin() xend() square)

5358

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 5: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

Before the WhyHowWhat

Maybe some mutual introductions

middot Your background (academic industry hellip)middot R experience (beginner intermediate advanced hellip)middot Created modified any R packages middot C andor C++ experience middot Main interest in Rcpp speed extensions hellip middot Following rcpp-devel or r-devel

458

Overview Why R

Why R Programming with Data

ChambersComputationalMethods for DataAnalysis Wiley1977

Becker Chambersand Wilks TheNew S LanguageChapman amp Hall1988

Chambers andHastie StatisticalModels in SChapman amp Hall1992

ChambersProgramming withData Springer1998

ChambersSoftware for DataAnalysisProgramming withR Springer 2008

Thanks to John Chambers for sending me high-resolution scans of the covers of his books

658

A Simple Example

xx lt- faithful[eruptions]fit lt- density(xx)plot(fit)

758

A Simple Example

1 2 3 4 5 6

00

01

02

03

04

05

densitydefault(x = xx)

N = 272 Bandwidth = 03348

Den

sity

858

A Simple Example - Refined

xx lt- faithful[eruptions]fit1 lt- density(xx)fit2 lt- replicate(10000

x lt- sample(xxreplace=TRUE)density(x from=min(fit1$x) to=max(fit1$x))$y

)fit3 lt- apply(fit2 1 quantilec(00250975))plot(fit1 ylim=range(fit3))polygon(c(fit1$xrev(fit1$x)) c(fit3[1]rev(fit3[2]))

col=grey border=F)lines(fit1)

958

A Simple Example - Refined

1 2 3 4 5 6

00

01

02

03

04

05

densitydefault(x = xx)

N = 272 Bandwidth = 03348

Den

sity

1058

So Why R

R enables us to

middot work interactivelymiddot explore and visualize datamiddot access retrieve andor generate datamiddot summarize and report into pdf html hellip

making it the key language for statistical computing and apreferred environment for many data analysts

1158

So Why R

R has always been extensible via

middot C via a bare-bones interface described in Writing RExtensions

middot Fortran which is also used internally by Rmiddot Java via rJava by Simon Urbanekmiddot C++ but essentially at the bare-bones level of C

So while in theory this always worked ndash it was tedious inpractice

1258

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written in the Clanguage itrsquos not surprising that the most direct interfaceto non-R software is for code written in C or directlycallable from C All the same including additional C codeis a serious step with some added dangers and often asubstantial amount of programming and debuggingrequired You should have a good reason

1358

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written in the Clanguage itrsquos not surprising that the most direct interfaceto non-R software is for code written in C or directlycallable from C All the same including additional C codeis a serious step with some added dangers and often asubstantial amount of programming and debuggingrequired You should have a good reason

1458

Why Extend R

Chambers proceeds with this rough map of the road ahead

middot Againstmiddot Itrsquos more workmiddot Bugs will bitemiddot Potential platform dependencymiddot Less readable software

middot In Favormiddot New and trusted computationsmiddot Speedmiddot Object references

1558

Why Extend R

The Why boils down to

middot speed Often a good enough reason for us hellip and a focus forus in this workshop

middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R

middot references Chambers quote from 2008 foreshadowed thework on the new Reference Classes now in R and built uponvia Rcpp Modules Rcpp Classes (and also RcppR6)

1658

Overview Why C++

Why C++

middot Asking Google leads to about 52 million hitsmiddot Wikipedia C++ is a statically typed free-form

multi-paradigm compiled general-purpose powerfulprogramming language

middot C++ is industrial-strength vendor-independent widely-usedand still evolving

middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connectto it probably has a CC++ API

middot As a widely used language it also has good tool support(debuggers profilers code analysis)

1858

Why C++

Scott Meyers View C++ as a federation of languages

middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C

middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)

middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages

middot The Standard Template Library (STL) is a specific templatelibrary which is powerful but has its own conventions

middot C++11 (and C++14 and beyond) add enough to be calleda fifth language

NB Meyers original list of four languages appeared years before C++11

1958

Why C++

middot Mature yet currentmiddot Strong performance focus

middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the machine

level and C++

middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly shielded

2058

Overview Vision

Bell Labs May 1976

2258

Interface Vision

R offers us the best of both worlds

middot Compiled code withmiddot Access to proven libraries and algorithms in CC++Fortranmiddot Extremely high performance (in both serial and parallel modes)

middot Interpreted code withmiddot An accessible high-level language made for Programming with

Datamiddot An interactive workflow for data analysismiddot Support for rapid prototyping research and experimentation

2358

Why Rcpp

middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples

middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure

middot Expressive as it allows for vectorised C++ using Rcpp Sugarmiddot Seamless access to all R objects vector matrix list

S3S4RefClass Environment Function hellipmiddot Speed gains for a variety of tasks Rcpp excels precisely

where R struggles loops function calls hellipmiddot Extensions greatly facilitates access to external libraries

using eg Rcpp modules

2458

Overview Speed

Speed Example 1 (due to Christian Robert)

Five different ways to compute 1(1 + x)

f lt- function(n x=1) for(i in 1n) x lt- 1(1+x)g lt- function(n x=1) for(i in 1n) x lt- (1(1+x))h lt- function(n x=1) for(i in 1n) x lt- (1+x)^(-1)j lt- function(n x=1) for(i in 1n) x lt- 11+xk lt- function(n x=1) for(i in 1n) x lt- 11+xlibrary(rbenchmark)N lt- 1e5benchmark(f(N1)g(N1)h(N1)j(N1)k(N1)order=relative)[14]

2658

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 5 k(N 1) 100 6435 1000 1 f(N 1) 100 6609 1027 2 g(N 1) 100 7757 1205 4 j(N 1) 100 7882 1225 3 h(N 1) 100 11766 1828

2758

Speed Example 1 (due to Christian Robert)

Adding a C++ variant is easy

cppFunction(double m(int n double x)

for (int i=0 iltn i++)x = 1 (1+x)

return x

)

(We will learn more about cppFunction() later)

2858

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 6 m(N 1) 100 0170 1000 1 f(N 1) 100 6854 40318 5 k(N 1) 100 7811 45947 4 j(N 1) 100 9183 54018 2 g(N 1) 100 9489 55818 3 h(N 1) 100 11725 68971

2958

Speed Example 2 (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3058

Speed Example 2 (due to StackOverflow)

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3158

Speed Example 2 (due to StackOverflow)

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0030 1000 2 f(15) 100 0335 11167 3 f(20) 100 3517 117233

3258

Speed Example 2 (due to StackOverflow)

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

deployed as

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

3358

Speed Example 2 (due to StackOverflow)

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(20) g(20) order=relative)[14]

test replications elapsed relative 2 g(20) 100 0011 1000 1 f(20) 100 3776 343273

A nice gain of a few orders of magnitude3458

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

3558

What Next

Hands-on

Programming with C++

middot C++ Basicsmiddot Debuggingmiddot Best Practices

and then on to Rcpp itself

3758

C++ Basics

Compiled not Interpreted

Need to compile and link

include ltcstdiogt

int main(void) printf(Hello worldn)return 0

3958

Compiled not Interpreted

Or streams output rather than printf

include ltiostreamgt

int main(void) stdcout ltlt Hello world ltlt stdendlreturn 0

4058

Compiled not Interpreted

g++ -o will compile and link

We will now look at an examples with explicit linking

4158

Compiled not Interpreted

include ltcstdiogt

define MATHLIB_STANDALONEinclude ltRmathhgt

int main(void) printf(N(01) 95th percentile 98fn

qnorm(095 00 10 1 0))

4258

Compiled not Interpreted

We may need to supply

middot header location via -Imiddot library location via -Lmiddot library via -llibraryname

g++ -Iusrinclude -c qnorm_rmathcppg++ -o qnorm_rmath qnorm_rmatho -Lusrlib -lRmath

4358

Statically Typed

middot R is dynamically typed x lt- 314 x lt- foo is validmiddot In C++ each variable must be declared before first usemiddot Common types are int and long (possibly with unsigned)float and double bool as well as char

middot No standard string type though stdstring is closemiddot All these variables types are scalars which is fundamentally

different from R where everything is a vectormiddot class (and struct) allow creation of composite types

classes add behaviour to data to form objectsmiddot Variables need to be declared cannot change

4458

C++ Basics A Better C

C++ is a Better C

middot control structures similar to what R offers for while ifswitch

middot functions are similar too but note the difference inpositional-only matching also same function name butdifferent arguments allowed in C++

middot pointers and memory management very different but lotsof issues people had with C can be avoided via STL (whichis something Rcpp promotes too)

middot sometimes still useful to know what a pointer is hellip

4658

Object-Oriented

This is a second key feature of C++ and it does it differentlyfrom S3 and S4

struct Date unsigned int yearunsigned int monthunsigned int day

struct Person char firstname[20]char lastname[20]struct Date birthdayunsigned long id

4758

Object-Oriented

Object-orientation in the C++ sense matches data with codeoperating on it

class Date private

unsigned int yearunsigned int monthunsigned int date

publicvoid setDate(int y int m int d)int getDay()int getMonth()int getYear()

4858

Generic Programming and the STL

The STL promotes generic programming

For example the sequence container types vector dequeand list all support

middot push_back() to insert at the endmiddot pop_back() to remove from the frontmiddot begin() returning an iterator to the first elementmiddot end() returning an iterator to just after the last elementmiddot size() for the number of elements

but only list has push_front() and pop_front()

Other useful containers set multiset map and multimap4958

Generic Programming and the STL

Traversal of containers can be achieved via iterators whichrequire suitable member functions begin() and end()

stdvectorltdoublegtconst_iterator sifor (si=sbegin() si = send() si++)

stdcout ltlt si ltlt stdendl

5058

Generic Programming and the STL

Another key STL part are algorithms

double sum = accumulate(sbegin() send() 0)

Some other STL algorithms are

middot find finds the first element equal to the supplied valuemiddot count counts the number of matching elementsmiddot transform applies a supplied function to each elementmiddot for_each sweeps over all elements does not altermiddot inner_product inner product of two vectors

5158

Template Programming

Template programming provides a lsquolanguage within C++rsquocode gets evaluated during compilation

One of the simplest template examples is

template lttypename Tgtconst Tamp min(const Tamp x const Tamp y)

return y lt x y x

This can now be used to compute the minimum between twoint variables or double or in fact any admissible typeproviding an operatorlt() for less-than comparison

5258

Template Programming

Another template example is a class squaring its argument

template lttypename Tgtclass square public stdunary_functionltTTgt public

T operator()(T t) const return tt

which can be used along with STL algorithms

transform(xbegin() xend() square)

5358

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 6: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

Overview Why R

Why R Programming with Data

ChambersComputationalMethods for DataAnalysis Wiley1977

Becker Chambersand Wilks TheNew S LanguageChapman amp Hall1988

Chambers andHastie StatisticalModels in SChapman amp Hall1992

ChambersProgramming withData Springer1998

ChambersSoftware for DataAnalysisProgramming withR Springer 2008

Thanks to John Chambers for sending me high-resolution scans of the covers of his books

658

A Simple Example

xx lt- faithful[eruptions]fit lt- density(xx)plot(fit)

758

A Simple Example

1 2 3 4 5 6

00

01

02

03

04

05

densitydefault(x = xx)

N = 272 Bandwidth = 03348

Den

sity

858

A Simple Example - Refined

xx lt- faithful[eruptions]fit1 lt- density(xx)fit2 lt- replicate(10000

x lt- sample(xxreplace=TRUE)density(x from=min(fit1$x) to=max(fit1$x))$y

)fit3 lt- apply(fit2 1 quantilec(00250975))plot(fit1 ylim=range(fit3))polygon(c(fit1$xrev(fit1$x)) c(fit3[1]rev(fit3[2]))

col=grey border=F)lines(fit1)

958

A Simple Example - Refined

1 2 3 4 5 6

00

01

02

03

04

05

densitydefault(x = xx)

N = 272 Bandwidth = 03348

Den

sity

1058

So Why R

R enables us to

middot work interactivelymiddot explore and visualize datamiddot access retrieve andor generate datamiddot summarize and report into pdf html hellip

making it the key language for statistical computing and apreferred environment for many data analysts

1158

So Why R

R has always been extensible via

middot C via a bare-bones interface described in Writing RExtensions

middot Fortran which is also used internally by Rmiddot Java via rJava by Simon Urbanekmiddot C++ but essentially at the bare-bones level of C

So while in theory this always worked ndash it was tedious inpractice

1258

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written in the Clanguage itrsquos not surprising that the most direct interfaceto non-R software is for code written in C or directlycallable from C All the same including additional C codeis a serious step with some added dangers and often asubstantial amount of programming and debuggingrequired You should have a good reason

1358

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written in the Clanguage itrsquos not surprising that the most direct interfaceto non-R software is for code written in C or directlycallable from C All the same including additional C codeis a serious step with some added dangers and often asubstantial amount of programming and debuggingrequired You should have a good reason

1458

Why Extend R

Chambers proceeds with this rough map of the road ahead

middot Againstmiddot Itrsquos more workmiddot Bugs will bitemiddot Potential platform dependencymiddot Less readable software

middot In Favormiddot New and trusted computationsmiddot Speedmiddot Object references

1558

Why Extend R

The Why boils down to

middot speed Often a good enough reason for us hellip and a focus forus in this workshop

middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R

middot references Chambers quote from 2008 foreshadowed thework on the new Reference Classes now in R and built uponvia Rcpp Modules Rcpp Classes (and also RcppR6)

1658

Overview Why C++

Why C++

middot Asking Google leads to about 52 million hitsmiddot Wikipedia C++ is a statically typed free-form

multi-paradigm compiled general-purpose powerfulprogramming language

middot C++ is industrial-strength vendor-independent widely-usedand still evolving

middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connectto it probably has a CC++ API

middot As a widely used language it also has good tool support(debuggers profilers code analysis)

1858

Why C++

Scott Meyers View C++ as a federation of languages

middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C

middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)

middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages

middot The Standard Template Library (STL) is a specific templatelibrary which is powerful but has its own conventions

middot C++11 (and C++14 and beyond) add enough to be calleda fifth language

NB Meyers original list of four languages appeared years before C++11

1958

Why C++

middot Mature yet currentmiddot Strong performance focus

middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the machine

level and C++

middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly shielded

2058

Overview Vision

Bell Labs May 1976

2258

Interface Vision

R offers us the best of both worlds

middot Compiled code withmiddot Access to proven libraries and algorithms in CC++Fortranmiddot Extremely high performance (in both serial and parallel modes)

middot Interpreted code withmiddot An accessible high-level language made for Programming with

Datamiddot An interactive workflow for data analysismiddot Support for rapid prototyping research and experimentation

2358

Why Rcpp

middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples

middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure

middot Expressive as it allows for vectorised C++ using Rcpp Sugarmiddot Seamless access to all R objects vector matrix list

S3S4RefClass Environment Function hellipmiddot Speed gains for a variety of tasks Rcpp excels precisely

where R struggles loops function calls hellipmiddot Extensions greatly facilitates access to external libraries

using eg Rcpp modules

2458

Overview Speed

Speed Example 1 (due to Christian Robert)

Five different ways to compute 1(1 + x)

f lt- function(n x=1) for(i in 1n) x lt- 1(1+x)g lt- function(n x=1) for(i in 1n) x lt- (1(1+x))h lt- function(n x=1) for(i in 1n) x lt- (1+x)^(-1)j lt- function(n x=1) for(i in 1n) x lt- 11+xk lt- function(n x=1) for(i in 1n) x lt- 11+xlibrary(rbenchmark)N lt- 1e5benchmark(f(N1)g(N1)h(N1)j(N1)k(N1)order=relative)[14]

2658

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 5 k(N 1) 100 6435 1000 1 f(N 1) 100 6609 1027 2 g(N 1) 100 7757 1205 4 j(N 1) 100 7882 1225 3 h(N 1) 100 11766 1828

2758

Speed Example 1 (due to Christian Robert)

Adding a C++ variant is easy

cppFunction(double m(int n double x)

for (int i=0 iltn i++)x = 1 (1+x)

return x

)

(We will learn more about cppFunction() later)

2858

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 6 m(N 1) 100 0170 1000 1 f(N 1) 100 6854 40318 5 k(N 1) 100 7811 45947 4 j(N 1) 100 9183 54018 2 g(N 1) 100 9489 55818 3 h(N 1) 100 11725 68971

2958

Speed Example 2 (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3058

Speed Example 2 (due to StackOverflow)

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3158

Speed Example 2 (due to StackOverflow)

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0030 1000 2 f(15) 100 0335 11167 3 f(20) 100 3517 117233

3258

Speed Example 2 (due to StackOverflow)

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

deployed as

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

3358

Speed Example 2 (due to StackOverflow)

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(20) g(20) order=relative)[14]

test replications elapsed relative 2 g(20) 100 0011 1000 1 f(20) 100 3776 343273

A nice gain of a few orders of magnitude3458

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

3558

What Next

Hands-on

Programming with C++

middot C++ Basicsmiddot Debuggingmiddot Best Practices

and then on to Rcpp itself

3758

C++ Basics

Compiled not Interpreted

Need to compile and link

include ltcstdiogt

int main(void) printf(Hello worldn)return 0

3958

Compiled not Interpreted

Or streams output rather than printf

include ltiostreamgt

int main(void) stdcout ltlt Hello world ltlt stdendlreturn 0

4058

Compiled not Interpreted

g++ -o will compile and link

We will now look at an examples with explicit linking

4158

Compiled not Interpreted

include ltcstdiogt

define MATHLIB_STANDALONEinclude ltRmathhgt

int main(void) printf(N(01) 95th percentile 98fn

qnorm(095 00 10 1 0))

4258

Compiled not Interpreted

We may need to supply

middot header location via -Imiddot library location via -Lmiddot library via -llibraryname

g++ -Iusrinclude -c qnorm_rmathcppg++ -o qnorm_rmath qnorm_rmatho -Lusrlib -lRmath

4358

Statically Typed

middot R is dynamically typed x lt- 314 x lt- foo is validmiddot In C++ each variable must be declared before first usemiddot Common types are int and long (possibly with unsigned)float and double bool as well as char

middot No standard string type though stdstring is closemiddot All these variables types are scalars which is fundamentally

different from R where everything is a vectormiddot class (and struct) allow creation of composite types

classes add behaviour to data to form objectsmiddot Variables need to be declared cannot change

4458

C++ Basics A Better C

C++ is a Better C

middot control structures similar to what R offers for while ifswitch

middot functions are similar too but note the difference inpositional-only matching also same function name butdifferent arguments allowed in C++

middot pointers and memory management very different but lotsof issues people had with C can be avoided via STL (whichis something Rcpp promotes too)

middot sometimes still useful to know what a pointer is hellip

4658

Object-Oriented

This is a second key feature of C++ and it does it differentlyfrom S3 and S4

struct Date unsigned int yearunsigned int monthunsigned int day

struct Person char firstname[20]char lastname[20]struct Date birthdayunsigned long id

4758

Object-Oriented

Object-orientation in the C++ sense matches data with codeoperating on it

class Date private

unsigned int yearunsigned int monthunsigned int date

publicvoid setDate(int y int m int d)int getDay()int getMonth()int getYear()

4858

Generic Programming and the STL

The STL promotes generic programming

For example the sequence container types vector dequeand list all support

middot push_back() to insert at the endmiddot pop_back() to remove from the frontmiddot begin() returning an iterator to the first elementmiddot end() returning an iterator to just after the last elementmiddot size() for the number of elements

but only list has push_front() and pop_front()

Other useful containers set multiset map and multimap4958

Generic Programming and the STL

Traversal of containers can be achieved via iterators whichrequire suitable member functions begin() and end()

stdvectorltdoublegtconst_iterator sifor (si=sbegin() si = send() si++)

stdcout ltlt si ltlt stdendl

5058

Generic Programming and the STL

Another key STL part are algorithms

double sum = accumulate(sbegin() send() 0)

Some other STL algorithms are

middot find finds the first element equal to the supplied valuemiddot count counts the number of matching elementsmiddot transform applies a supplied function to each elementmiddot for_each sweeps over all elements does not altermiddot inner_product inner product of two vectors

5158

Template Programming

Template programming provides a lsquolanguage within C++rsquocode gets evaluated during compilation

One of the simplest template examples is

template lttypename Tgtconst Tamp min(const Tamp x const Tamp y)

return y lt x y x

This can now be used to compute the minimum between twoint variables or double or in fact any admissible typeproviding an operatorlt() for less-than comparison

5258

Template Programming

Another template example is a class squaring its argument

template lttypename Tgtclass square public stdunary_functionltTTgt public

T operator()(T t) const return tt

which can be used along with STL algorithms

transform(xbegin() xend() square)

5358

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 7: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

Why R Programming with Data

ChambersComputationalMethods for DataAnalysis Wiley1977

Becker Chambersand Wilks TheNew S LanguageChapman amp Hall1988

Chambers andHastie StatisticalModels in SChapman amp Hall1992

ChambersProgramming withData Springer1998

ChambersSoftware for DataAnalysisProgramming withR Springer 2008

Thanks to John Chambers for sending me high-resolution scans of the covers of his books

658

A Simple Example

xx lt- faithful[eruptions]fit lt- density(xx)plot(fit)

758

A Simple Example

1 2 3 4 5 6

00

01

02

03

04

05

densitydefault(x = xx)

N = 272 Bandwidth = 03348

Den

sity

858

A Simple Example - Refined

xx lt- faithful[eruptions]fit1 lt- density(xx)fit2 lt- replicate(10000

x lt- sample(xxreplace=TRUE)density(x from=min(fit1$x) to=max(fit1$x))$y

)fit3 lt- apply(fit2 1 quantilec(00250975))plot(fit1 ylim=range(fit3))polygon(c(fit1$xrev(fit1$x)) c(fit3[1]rev(fit3[2]))

col=grey border=F)lines(fit1)

958

A Simple Example - Refined

1 2 3 4 5 6

00

01

02

03

04

05

densitydefault(x = xx)

N = 272 Bandwidth = 03348

Den

sity

1058

So Why R

R enables us to

middot work interactivelymiddot explore and visualize datamiddot access retrieve andor generate datamiddot summarize and report into pdf html hellip

making it the key language for statistical computing and apreferred environment for many data analysts

1158

So Why R

R has always been extensible via

middot C via a bare-bones interface described in Writing RExtensions

middot Fortran which is also used internally by Rmiddot Java via rJava by Simon Urbanekmiddot C++ but essentially at the bare-bones level of C

So while in theory this always worked ndash it was tedious inpractice

1258

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written in the Clanguage itrsquos not surprising that the most direct interfaceto non-R software is for code written in C or directlycallable from C All the same including additional C codeis a serious step with some added dangers and often asubstantial amount of programming and debuggingrequired You should have a good reason

1358

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written in the Clanguage itrsquos not surprising that the most direct interfaceto non-R software is for code written in C or directlycallable from C All the same including additional C codeis a serious step with some added dangers and often asubstantial amount of programming and debuggingrequired You should have a good reason

1458

Why Extend R

Chambers proceeds with this rough map of the road ahead

middot Againstmiddot Itrsquos more workmiddot Bugs will bitemiddot Potential platform dependencymiddot Less readable software

middot In Favormiddot New and trusted computationsmiddot Speedmiddot Object references

1558

Why Extend R

The Why boils down to

middot speed Often a good enough reason for us hellip and a focus forus in this workshop

middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R

middot references Chambers quote from 2008 foreshadowed thework on the new Reference Classes now in R and built uponvia Rcpp Modules Rcpp Classes (and also RcppR6)

1658

Overview Why C++

Why C++

middot Asking Google leads to about 52 million hitsmiddot Wikipedia C++ is a statically typed free-form

multi-paradigm compiled general-purpose powerfulprogramming language

middot C++ is industrial-strength vendor-independent widely-usedand still evolving

middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connectto it probably has a CC++ API

middot As a widely used language it also has good tool support(debuggers profilers code analysis)

1858

Why C++

Scott Meyers View C++ as a federation of languages

middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C

middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)

middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages

middot The Standard Template Library (STL) is a specific templatelibrary which is powerful but has its own conventions

middot C++11 (and C++14 and beyond) add enough to be calleda fifth language

NB Meyers original list of four languages appeared years before C++11

1958

Why C++

middot Mature yet currentmiddot Strong performance focus

middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the machine

level and C++

middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly shielded

2058

Overview Vision

Bell Labs May 1976

2258

Interface Vision

R offers us the best of both worlds

middot Compiled code withmiddot Access to proven libraries and algorithms in CC++Fortranmiddot Extremely high performance (in both serial and parallel modes)

middot Interpreted code withmiddot An accessible high-level language made for Programming with

Datamiddot An interactive workflow for data analysismiddot Support for rapid prototyping research and experimentation

2358

Why Rcpp

middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples

middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure

middot Expressive as it allows for vectorised C++ using Rcpp Sugarmiddot Seamless access to all R objects vector matrix list

S3S4RefClass Environment Function hellipmiddot Speed gains for a variety of tasks Rcpp excels precisely

where R struggles loops function calls hellipmiddot Extensions greatly facilitates access to external libraries

using eg Rcpp modules

2458

Overview Speed

Speed Example 1 (due to Christian Robert)

Five different ways to compute 1(1 + x)

f lt- function(n x=1) for(i in 1n) x lt- 1(1+x)g lt- function(n x=1) for(i in 1n) x lt- (1(1+x))h lt- function(n x=1) for(i in 1n) x lt- (1+x)^(-1)j lt- function(n x=1) for(i in 1n) x lt- 11+xk lt- function(n x=1) for(i in 1n) x lt- 11+xlibrary(rbenchmark)N lt- 1e5benchmark(f(N1)g(N1)h(N1)j(N1)k(N1)order=relative)[14]

2658

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 5 k(N 1) 100 6435 1000 1 f(N 1) 100 6609 1027 2 g(N 1) 100 7757 1205 4 j(N 1) 100 7882 1225 3 h(N 1) 100 11766 1828

2758

Speed Example 1 (due to Christian Robert)

Adding a C++ variant is easy

cppFunction(double m(int n double x)

for (int i=0 iltn i++)x = 1 (1+x)

return x

)

(We will learn more about cppFunction() later)

2858

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 6 m(N 1) 100 0170 1000 1 f(N 1) 100 6854 40318 5 k(N 1) 100 7811 45947 4 j(N 1) 100 9183 54018 2 g(N 1) 100 9489 55818 3 h(N 1) 100 11725 68971

2958

Speed Example 2 (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3058

Speed Example 2 (due to StackOverflow)

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3158

Speed Example 2 (due to StackOverflow)

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0030 1000 2 f(15) 100 0335 11167 3 f(20) 100 3517 117233

3258

Speed Example 2 (due to StackOverflow)

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

deployed as

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

3358

Speed Example 2 (due to StackOverflow)

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(20) g(20) order=relative)[14]

test replications elapsed relative 2 g(20) 100 0011 1000 1 f(20) 100 3776 343273

A nice gain of a few orders of magnitude3458

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

3558

What Next

Hands-on

Programming with C++

middot C++ Basicsmiddot Debuggingmiddot Best Practices

and then on to Rcpp itself

3758

C++ Basics

Compiled not Interpreted

Need to compile and link

include ltcstdiogt

int main(void) printf(Hello worldn)return 0

3958

Compiled not Interpreted

Or streams output rather than printf

include ltiostreamgt

int main(void) stdcout ltlt Hello world ltlt stdendlreturn 0

4058

Compiled not Interpreted

g++ -o will compile and link

We will now look at an examples with explicit linking

4158

Compiled not Interpreted

include ltcstdiogt

define MATHLIB_STANDALONEinclude ltRmathhgt

int main(void) printf(N(01) 95th percentile 98fn

qnorm(095 00 10 1 0))

4258

Compiled not Interpreted

We may need to supply

middot header location via -Imiddot library location via -Lmiddot library via -llibraryname

g++ -Iusrinclude -c qnorm_rmathcppg++ -o qnorm_rmath qnorm_rmatho -Lusrlib -lRmath

4358

Statically Typed

middot R is dynamically typed x lt- 314 x lt- foo is validmiddot In C++ each variable must be declared before first usemiddot Common types are int and long (possibly with unsigned)float and double bool as well as char

middot No standard string type though stdstring is closemiddot All these variables types are scalars which is fundamentally

different from R where everything is a vectormiddot class (and struct) allow creation of composite types

classes add behaviour to data to form objectsmiddot Variables need to be declared cannot change

4458

C++ Basics A Better C

C++ is a Better C

middot control structures similar to what R offers for while ifswitch

middot functions are similar too but note the difference inpositional-only matching also same function name butdifferent arguments allowed in C++

middot pointers and memory management very different but lotsof issues people had with C can be avoided via STL (whichis something Rcpp promotes too)

middot sometimes still useful to know what a pointer is hellip

4658

Object-Oriented

This is a second key feature of C++ and it does it differentlyfrom S3 and S4

struct Date unsigned int yearunsigned int monthunsigned int day

struct Person char firstname[20]char lastname[20]struct Date birthdayunsigned long id

4758

Object-Oriented

Object-orientation in the C++ sense matches data with codeoperating on it

class Date private

unsigned int yearunsigned int monthunsigned int date

publicvoid setDate(int y int m int d)int getDay()int getMonth()int getYear()

4858

Generic Programming and the STL

The STL promotes generic programming

For example the sequence container types vector dequeand list all support

middot push_back() to insert at the endmiddot pop_back() to remove from the frontmiddot begin() returning an iterator to the first elementmiddot end() returning an iterator to just after the last elementmiddot size() for the number of elements

but only list has push_front() and pop_front()

Other useful containers set multiset map and multimap4958

Generic Programming and the STL

Traversal of containers can be achieved via iterators whichrequire suitable member functions begin() and end()

stdvectorltdoublegtconst_iterator sifor (si=sbegin() si = send() si++)

stdcout ltlt si ltlt stdendl

5058

Generic Programming and the STL

Another key STL part are algorithms

double sum = accumulate(sbegin() send() 0)

Some other STL algorithms are

middot find finds the first element equal to the supplied valuemiddot count counts the number of matching elementsmiddot transform applies a supplied function to each elementmiddot for_each sweeps over all elements does not altermiddot inner_product inner product of two vectors

5158

Template Programming

Template programming provides a lsquolanguage within C++rsquocode gets evaluated during compilation

One of the simplest template examples is

template lttypename Tgtconst Tamp min(const Tamp x const Tamp y)

return y lt x y x

This can now be used to compute the minimum between twoint variables or double or in fact any admissible typeproviding an operatorlt() for less-than comparison

5258

Template Programming

Another template example is a class squaring its argument

template lttypename Tgtclass square public stdunary_functionltTTgt public

T operator()(T t) const return tt

which can be used along with STL algorithms

transform(xbegin() xend() square)

5358

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 8: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

A Simple Example

xx lt- faithful[eruptions]fit lt- density(xx)plot(fit)

758

A Simple Example

1 2 3 4 5 6

00

01

02

03

04

05

densitydefault(x = xx)

N = 272 Bandwidth = 03348

Den

sity

858

A Simple Example - Refined

xx lt- faithful[eruptions]fit1 lt- density(xx)fit2 lt- replicate(10000

x lt- sample(xxreplace=TRUE)density(x from=min(fit1$x) to=max(fit1$x))$y

)fit3 lt- apply(fit2 1 quantilec(00250975))plot(fit1 ylim=range(fit3))polygon(c(fit1$xrev(fit1$x)) c(fit3[1]rev(fit3[2]))

col=grey border=F)lines(fit1)

958

A Simple Example - Refined

1 2 3 4 5 6

00

01

02

03

04

05

densitydefault(x = xx)

N = 272 Bandwidth = 03348

Den

sity

1058

So Why R

R enables us to

middot work interactivelymiddot explore and visualize datamiddot access retrieve andor generate datamiddot summarize and report into pdf html hellip

making it the key language for statistical computing and apreferred environment for many data analysts

1158

So Why R

R has always been extensible via

middot C via a bare-bones interface described in Writing RExtensions

middot Fortran which is also used internally by Rmiddot Java via rJava by Simon Urbanekmiddot C++ but essentially at the bare-bones level of C

So while in theory this always worked ndash it was tedious inpractice

1258

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written in the Clanguage itrsquos not surprising that the most direct interfaceto non-R software is for code written in C or directlycallable from C All the same including additional C codeis a serious step with some added dangers and often asubstantial amount of programming and debuggingrequired You should have a good reason

1358

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written in the Clanguage itrsquos not surprising that the most direct interfaceto non-R software is for code written in C or directlycallable from C All the same including additional C codeis a serious step with some added dangers and often asubstantial amount of programming and debuggingrequired You should have a good reason

1458

Why Extend R

Chambers proceeds with this rough map of the road ahead

middot Againstmiddot Itrsquos more workmiddot Bugs will bitemiddot Potential platform dependencymiddot Less readable software

middot In Favormiddot New and trusted computationsmiddot Speedmiddot Object references

1558

Why Extend R

The Why boils down to

middot speed Often a good enough reason for us hellip and a focus forus in this workshop

middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R

middot references Chambers quote from 2008 foreshadowed thework on the new Reference Classes now in R and built uponvia Rcpp Modules Rcpp Classes (and also RcppR6)

1658

Overview Why C++

Why C++

middot Asking Google leads to about 52 million hitsmiddot Wikipedia C++ is a statically typed free-form

multi-paradigm compiled general-purpose powerfulprogramming language

middot C++ is industrial-strength vendor-independent widely-usedand still evolving

middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connectto it probably has a CC++ API

middot As a widely used language it also has good tool support(debuggers profilers code analysis)

1858

Why C++

Scott Meyers View C++ as a federation of languages

middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C

middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)

middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages

middot The Standard Template Library (STL) is a specific templatelibrary which is powerful but has its own conventions

middot C++11 (and C++14 and beyond) add enough to be calleda fifth language

NB Meyers original list of four languages appeared years before C++11

1958

Why C++

middot Mature yet currentmiddot Strong performance focus

middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the machine

level and C++

middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly shielded

2058

Overview Vision

Bell Labs May 1976

2258

Interface Vision

R offers us the best of both worlds

middot Compiled code withmiddot Access to proven libraries and algorithms in CC++Fortranmiddot Extremely high performance (in both serial and parallel modes)

middot Interpreted code withmiddot An accessible high-level language made for Programming with

Datamiddot An interactive workflow for data analysismiddot Support for rapid prototyping research and experimentation

2358

Why Rcpp

middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples

middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure

middot Expressive as it allows for vectorised C++ using Rcpp Sugarmiddot Seamless access to all R objects vector matrix list

S3S4RefClass Environment Function hellipmiddot Speed gains for a variety of tasks Rcpp excels precisely

where R struggles loops function calls hellipmiddot Extensions greatly facilitates access to external libraries

using eg Rcpp modules

2458

Overview Speed

Speed Example 1 (due to Christian Robert)

Five different ways to compute 1(1 + x)

f lt- function(n x=1) for(i in 1n) x lt- 1(1+x)g lt- function(n x=1) for(i in 1n) x lt- (1(1+x))h lt- function(n x=1) for(i in 1n) x lt- (1+x)^(-1)j lt- function(n x=1) for(i in 1n) x lt- 11+xk lt- function(n x=1) for(i in 1n) x lt- 11+xlibrary(rbenchmark)N lt- 1e5benchmark(f(N1)g(N1)h(N1)j(N1)k(N1)order=relative)[14]

2658

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 5 k(N 1) 100 6435 1000 1 f(N 1) 100 6609 1027 2 g(N 1) 100 7757 1205 4 j(N 1) 100 7882 1225 3 h(N 1) 100 11766 1828

2758

Speed Example 1 (due to Christian Robert)

Adding a C++ variant is easy

cppFunction(double m(int n double x)

for (int i=0 iltn i++)x = 1 (1+x)

return x

)

(We will learn more about cppFunction() later)

2858

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 6 m(N 1) 100 0170 1000 1 f(N 1) 100 6854 40318 5 k(N 1) 100 7811 45947 4 j(N 1) 100 9183 54018 2 g(N 1) 100 9489 55818 3 h(N 1) 100 11725 68971

2958

Speed Example 2 (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3058

Speed Example 2 (due to StackOverflow)

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3158

Speed Example 2 (due to StackOverflow)

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0030 1000 2 f(15) 100 0335 11167 3 f(20) 100 3517 117233

3258

Speed Example 2 (due to StackOverflow)

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

deployed as

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

3358

Speed Example 2 (due to StackOverflow)

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(20) g(20) order=relative)[14]

test replications elapsed relative 2 g(20) 100 0011 1000 1 f(20) 100 3776 343273

A nice gain of a few orders of magnitude3458

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

3558

What Next

Hands-on

Programming with C++

middot C++ Basicsmiddot Debuggingmiddot Best Practices

and then on to Rcpp itself

3758

C++ Basics

Compiled not Interpreted

Need to compile and link

include ltcstdiogt

int main(void) printf(Hello worldn)return 0

3958

Compiled not Interpreted

Or streams output rather than printf

include ltiostreamgt

int main(void) stdcout ltlt Hello world ltlt stdendlreturn 0

4058

Compiled not Interpreted

g++ -o will compile and link

We will now look at an examples with explicit linking

4158

Compiled not Interpreted

include ltcstdiogt

define MATHLIB_STANDALONEinclude ltRmathhgt

int main(void) printf(N(01) 95th percentile 98fn

qnorm(095 00 10 1 0))

4258

Compiled not Interpreted

We may need to supply

middot header location via -Imiddot library location via -Lmiddot library via -llibraryname

g++ -Iusrinclude -c qnorm_rmathcppg++ -o qnorm_rmath qnorm_rmatho -Lusrlib -lRmath

4358

Statically Typed

middot R is dynamically typed x lt- 314 x lt- foo is validmiddot In C++ each variable must be declared before first usemiddot Common types are int and long (possibly with unsigned)float and double bool as well as char

middot No standard string type though stdstring is closemiddot All these variables types are scalars which is fundamentally

different from R where everything is a vectormiddot class (and struct) allow creation of composite types

classes add behaviour to data to form objectsmiddot Variables need to be declared cannot change

4458

C++ Basics A Better C

C++ is a Better C

middot control structures similar to what R offers for while ifswitch

middot functions are similar too but note the difference inpositional-only matching also same function name butdifferent arguments allowed in C++

middot pointers and memory management very different but lotsof issues people had with C can be avoided via STL (whichis something Rcpp promotes too)

middot sometimes still useful to know what a pointer is hellip

4658

Object-Oriented

This is a second key feature of C++ and it does it differentlyfrom S3 and S4

struct Date unsigned int yearunsigned int monthunsigned int day

struct Person char firstname[20]char lastname[20]struct Date birthdayunsigned long id

4758

Object-Oriented

Object-orientation in the C++ sense matches data with codeoperating on it

class Date private

unsigned int yearunsigned int monthunsigned int date

publicvoid setDate(int y int m int d)int getDay()int getMonth()int getYear()

4858

Generic Programming and the STL

The STL promotes generic programming

For example the sequence container types vector dequeand list all support

middot push_back() to insert at the endmiddot pop_back() to remove from the frontmiddot begin() returning an iterator to the first elementmiddot end() returning an iterator to just after the last elementmiddot size() for the number of elements

but only list has push_front() and pop_front()

Other useful containers set multiset map and multimap4958

Generic Programming and the STL

Traversal of containers can be achieved via iterators whichrequire suitable member functions begin() and end()

stdvectorltdoublegtconst_iterator sifor (si=sbegin() si = send() si++)

stdcout ltlt si ltlt stdendl

5058

Generic Programming and the STL

Another key STL part are algorithms

double sum = accumulate(sbegin() send() 0)

Some other STL algorithms are

middot find finds the first element equal to the supplied valuemiddot count counts the number of matching elementsmiddot transform applies a supplied function to each elementmiddot for_each sweeps over all elements does not altermiddot inner_product inner product of two vectors

5158

Template Programming

Template programming provides a lsquolanguage within C++rsquocode gets evaluated during compilation

One of the simplest template examples is

template lttypename Tgtconst Tamp min(const Tamp x const Tamp y)

return y lt x y x

This can now be used to compute the minimum between twoint variables or double or in fact any admissible typeproviding an operatorlt() for less-than comparison

5258

Template Programming

Another template example is a class squaring its argument

template lttypename Tgtclass square public stdunary_functionltTTgt public

T operator()(T t) const return tt

which can be used along with STL algorithms

transform(xbegin() xend() square)

5358

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 9: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

A Simple Example

1 2 3 4 5 6

00

01

02

03

04

05

densitydefault(x = xx)

N = 272 Bandwidth = 03348

Den

sity

858

A Simple Example - Refined

xx lt- faithful[eruptions]fit1 lt- density(xx)fit2 lt- replicate(10000

x lt- sample(xxreplace=TRUE)density(x from=min(fit1$x) to=max(fit1$x))$y

)fit3 lt- apply(fit2 1 quantilec(00250975))plot(fit1 ylim=range(fit3))polygon(c(fit1$xrev(fit1$x)) c(fit3[1]rev(fit3[2]))

col=grey border=F)lines(fit1)

958

A Simple Example - Refined

1 2 3 4 5 6

00

01

02

03

04

05

densitydefault(x = xx)

N = 272 Bandwidth = 03348

Den

sity

1058

So Why R

R enables us to

middot work interactivelymiddot explore and visualize datamiddot access retrieve andor generate datamiddot summarize and report into pdf html hellip

making it the key language for statistical computing and apreferred environment for many data analysts

1158

So Why R

R has always been extensible via

middot C via a bare-bones interface described in Writing RExtensions

middot Fortran which is also used internally by Rmiddot Java via rJava by Simon Urbanekmiddot C++ but essentially at the bare-bones level of C

So while in theory this always worked ndash it was tedious inpractice

1258

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written in the Clanguage itrsquos not surprising that the most direct interfaceto non-R software is for code written in C or directlycallable from C All the same including additional C codeis a serious step with some added dangers and often asubstantial amount of programming and debuggingrequired You should have a good reason

1358

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written in the Clanguage itrsquos not surprising that the most direct interfaceto non-R software is for code written in C or directlycallable from C All the same including additional C codeis a serious step with some added dangers and often asubstantial amount of programming and debuggingrequired You should have a good reason

1458

Why Extend R

Chambers proceeds with this rough map of the road ahead

middot Againstmiddot Itrsquos more workmiddot Bugs will bitemiddot Potential platform dependencymiddot Less readable software

middot In Favormiddot New and trusted computationsmiddot Speedmiddot Object references

1558

Why Extend R

The Why boils down to

middot speed Often a good enough reason for us hellip and a focus forus in this workshop

middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R

middot references Chambers quote from 2008 foreshadowed thework on the new Reference Classes now in R and built uponvia Rcpp Modules Rcpp Classes (and also RcppR6)

1658

Overview Why C++

Why C++

middot Asking Google leads to about 52 million hitsmiddot Wikipedia C++ is a statically typed free-form

multi-paradigm compiled general-purpose powerfulprogramming language

middot C++ is industrial-strength vendor-independent widely-usedand still evolving

middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connectto it probably has a CC++ API

middot As a widely used language it also has good tool support(debuggers profilers code analysis)

1858

Why C++

Scott Meyers View C++ as a federation of languages

middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C

middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)

middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages

middot The Standard Template Library (STL) is a specific templatelibrary which is powerful but has its own conventions

middot C++11 (and C++14 and beyond) add enough to be calleda fifth language

NB Meyers original list of four languages appeared years before C++11

1958

Why C++

middot Mature yet currentmiddot Strong performance focus

middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the machine

level and C++

middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly shielded

2058

Overview Vision

Bell Labs May 1976

2258

Interface Vision

R offers us the best of both worlds

middot Compiled code withmiddot Access to proven libraries and algorithms in CC++Fortranmiddot Extremely high performance (in both serial and parallel modes)

middot Interpreted code withmiddot An accessible high-level language made for Programming with

Datamiddot An interactive workflow for data analysismiddot Support for rapid prototyping research and experimentation

2358

Why Rcpp

middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples

middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure

middot Expressive as it allows for vectorised C++ using Rcpp Sugarmiddot Seamless access to all R objects vector matrix list

S3S4RefClass Environment Function hellipmiddot Speed gains for a variety of tasks Rcpp excels precisely

where R struggles loops function calls hellipmiddot Extensions greatly facilitates access to external libraries

using eg Rcpp modules

2458

Overview Speed

Speed Example 1 (due to Christian Robert)

Five different ways to compute 1(1 + x)

f lt- function(n x=1) for(i in 1n) x lt- 1(1+x)g lt- function(n x=1) for(i in 1n) x lt- (1(1+x))h lt- function(n x=1) for(i in 1n) x lt- (1+x)^(-1)j lt- function(n x=1) for(i in 1n) x lt- 11+xk lt- function(n x=1) for(i in 1n) x lt- 11+xlibrary(rbenchmark)N lt- 1e5benchmark(f(N1)g(N1)h(N1)j(N1)k(N1)order=relative)[14]

2658

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 5 k(N 1) 100 6435 1000 1 f(N 1) 100 6609 1027 2 g(N 1) 100 7757 1205 4 j(N 1) 100 7882 1225 3 h(N 1) 100 11766 1828

2758

Speed Example 1 (due to Christian Robert)

Adding a C++ variant is easy

cppFunction(double m(int n double x)

for (int i=0 iltn i++)x = 1 (1+x)

return x

)

(We will learn more about cppFunction() later)

2858

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 6 m(N 1) 100 0170 1000 1 f(N 1) 100 6854 40318 5 k(N 1) 100 7811 45947 4 j(N 1) 100 9183 54018 2 g(N 1) 100 9489 55818 3 h(N 1) 100 11725 68971

2958

Speed Example 2 (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3058

Speed Example 2 (due to StackOverflow)

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3158

Speed Example 2 (due to StackOverflow)

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0030 1000 2 f(15) 100 0335 11167 3 f(20) 100 3517 117233

3258

Speed Example 2 (due to StackOverflow)

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

deployed as

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

3358

Speed Example 2 (due to StackOverflow)

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(20) g(20) order=relative)[14]

test replications elapsed relative 2 g(20) 100 0011 1000 1 f(20) 100 3776 343273

A nice gain of a few orders of magnitude3458

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

3558

What Next

Hands-on

Programming with C++

middot C++ Basicsmiddot Debuggingmiddot Best Practices

and then on to Rcpp itself

3758

C++ Basics

Compiled not Interpreted

Need to compile and link

include ltcstdiogt

int main(void) printf(Hello worldn)return 0

3958

Compiled not Interpreted

Or streams output rather than printf

include ltiostreamgt

int main(void) stdcout ltlt Hello world ltlt stdendlreturn 0

4058

Compiled not Interpreted

g++ -o will compile and link

We will now look at an examples with explicit linking

4158

Compiled not Interpreted

include ltcstdiogt

define MATHLIB_STANDALONEinclude ltRmathhgt

int main(void) printf(N(01) 95th percentile 98fn

qnorm(095 00 10 1 0))

4258

Compiled not Interpreted

We may need to supply

middot header location via -Imiddot library location via -Lmiddot library via -llibraryname

g++ -Iusrinclude -c qnorm_rmathcppg++ -o qnorm_rmath qnorm_rmatho -Lusrlib -lRmath

4358

Statically Typed

middot R is dynamically typed x lt- 314 x lt- foo is validmiddot In C++ each variable must be declared before first usemiddot Common types are int and long (possibly with unsigned)float and double bool as well as char

middot No standard string type though stdstring is closemiddot All these variables types are scalars which is fundamentally

different from R where everything is a vectormiddot class (and struct) allow creation of composite types

classes add behaviour to data to form objectsmiddot Variables need to be declared cannot change

4458

C++ Basics A Better C

C++ is a Better C

middot control structures similar to what R offers for while ifswitch

middot functions are similar too but note the difference inpositional-only matching also same function name butdifferent arguments allowed in C++

middot pointers and memory management very different but lotsof issues people had with C can be avoided via STL (whichis something Rcpp promotes too)

middot sometimes still useful to know what a pointer is hellip

4658

Object-Oriented

This is a second key feature of C++ and it does it differentlyfrom S3 and S4

struct Date unsigned int yearunsigned int monthunsigned int day

struct Person char firstname[20]char lastname[20]struct Date birthdayunsigned long id

4758

Object-Oriented

Object-orientation in the C++ sense matches data with codeoperating on it

class Date private

unsigned int yearunsigned int monthunsigned int date

publicvoid setDate(int y int m int d)int getDay()int getMonth()int getYear()

4858

Generic Programming and the STL

The STL promotes generic programming

For example the sequence container types vector dequeand list all support

middot push_back() to insert at the endmiddot pop_back() to remove from the frontmiddot begin() returning an iterator to the first elementmiddot end() returning an iterator to just after the last elementmiddot size() for the number of elements

but only list has push_front() and pop_front()

Other useful containers set multiset map and multimap4958

Generic Programming and the STL

Traversal of containers can be achieved via iterators whichrequire suitable member functions begin() and end()

stdvectorltdoublegtconst_iterator sifor (si=sbegin() si = send() si++)

stdcout ltlt si ltlt stdendl

5058

Generic Programming and the STL

Another key STL part are algorithms

double sum = accumulate(sbegin() send() 0)

Some other STL algorithms are

middot find finds the first element equal to the supplied valuemiddot count counts the number of matching elementsmiddot transform applies a supplied function to each elementmiddot for_each sweeps over all elements does not altermiddot inner_product inner product of two vectors

5158

Template Programming

Template programming provides a lsquolanguage within C++rsquocode gets evaluated during compilation

One of the simplest template examples is

template lttypename Tgtconst Tamp min(const Tamp x const Tamp y)

return y lt x y x

This can now be used to compute the minimum between twoint variables or double or in fact any admissible typeproviding an operatorlt() for less-than comparison

5258

Template Programming

Another template example is a class squaring its argument

template lttypename Tgtclass square public stdunary_functionltTTgt public

T operator()(T t) const return tt

which can be used along with STL algorithms

transform(xbegin() xend() square)

5358

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 10: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

A Simple Example - Refined

xx lt- faithful[eruptions]fit1 lt- density(xx)fit2 lt- replicate(10000

x lt- sample(xxreplace=TRUE)density(x from=min(fit1$x) to=max(fit1$x))$y

)fit3 lt- apply(fit2 1 quantilec(00250975))plot(fit1 ylim=range(fit3))polygon(c(fit1$xrev(fit1$x)) c(fit3[1]rev(fit3[2]))

col=grey border=F)lines(fit1)

958

A Simple Example - Refined

1 2 3 4 5 6

00

01

02

03

04

05

densitydefault(x = xx)

N = 272 Bandwidth = 03348

Den

sity

1058

So Why R

R enables us to

middot work interactivelymiddot explore and visualize datamiddot access retrieve andor generate datamiddot summarize and report into pdf html hellip

making it the key language for statistical computing and apreferred environment for many data analysts

1158

So Why R

R has always been extensible via

middot C via a bare-bones interface described in Writing RExtensions

middot Fortran which is also used internally by Rmiddot Java via rJava by Simon Urbanekmiddot C++ but essentially at the bare-bones level of C

So while in theory this always worked ndash it was tedious inpractice

1258

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written in the Clanguage itrsquos not surprising that the most direct interfaceto non-R software is for code written in C or directlycallable from C All the same including additional C codeis a serious step with some added dangers and often asubstantial amount of programming and debuggingrequired You should have a good reason

1358

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written in the Clanguage itrsquos not surprising that the most direct interfaceto non-R software is for code written in C or directlycallable from C All the same including additional C codeis a serious step with some added dangers and often asubstantial amount of programming and debuggingrequired You should have a good reason

1458

Why Extend R

Chambers proceeds with this rough map of the road ahead

middot Againstmiddot Itrsquos more workmiddot Bugs will bitemiddot Potential platform dependencymiddot Less readable software

middot In Favormiddot New and trusted computationsmiddot Speedmiddot Object references

1558

Why Extend R

The Why boils down to

middot speed Often a good enough reason for us hellip and a focus forus in this workshop

middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R

middot references Chambers quote from 2008 foreshadowed thework on the new Reference Classes now in R and built uponvia Rcpp Modules Rcpp Classes (and also RcppR6)

1658

Overview Why C++

Why C++

middot Asking Google leads to about 52 million hitsmiddot Wikipedia C++ is a statically typed free-form

multi-paradigm compiled general-purpose powerfulprogramming language

middot C++ is industrial-strength vendor-independent widely-usedand still evolving

middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connectto it probably has a CC++ API

middot As a widely used language it also has good tool support(debuggers profilers code analysis)

1858

Why C++

Scott Meyers View C++ as a federation of languages

middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C

middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)

middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages

middot The Standard Template Library (STL) is a specific templatelibrary which is powerful but has its own conventions

middot C++11 (and C++14 and beyond) add enough to be calleda fifth language

NB Meyers original list of four languages appeared years before C++11

1958

Why C++

middot Mature yet currentmiddot Strong performance focus

middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the machine

level and C++

middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly shielded

2058

Overview Vision

Bell Labs May 1976

2258

Interface Vision

R offers us the best of both worlds

middot Compiled code withmiddot Access to proven libraries and algorithms in CC++Fortranmiddot Extremely high performance (in both serial and parallel modes)

middot Interpreted code withmiddot An accessible high-level language made for Programming with

Datamiddot An interactive workflow for data analysismiddot Support for rapid prototyping research and experimentation

2358

Why Rcpp

middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples

middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure

middot Expressive as it allows for vectorised C++ using Rcpp Sugarmiddot Seamless access to all R objects vector matrix list

S3S4RefClass Environment Function hellipmiddot Speed gains for a variety of tasks Rcpp excels precisely

where R struggles loops function calls hellipmiddot Extensions greatly facilitates access to external libraries

using eg Rcpp modules

2458

Overview Speed

Speed Example 1 (due to Christian Robert)

Five different ways to compute 1(1 + x)

f lt- function(n x=1) for(i in 1n) x lt- 1(1+x)g lt- function(n x=1) for(i in 1n) x lt- (1(1+x))h lt- function(n x=1) for(i in 1n) x lt- (1+x)^(-1)j lt- function(n x=1) for(i in 1n) x lt- 11+xk lt- function(n x=1) for(i in 1n) x lt- 11+xlibrary(rbenchmark)N lt- 1e5benchmark(f(N1)g(N1)h(N1)j(N1)k(N1)order=relative)[14]

2658

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 5 k(N 1) 100 6435 1000 1 f(N 1) 100 6609 1027 2 g(N 1) 100 7757 1205 4 j(N 1) 100 7882 1225 3 h(N 1) 100 11766 1828

2758

Speed Example 1 (due to Christian Robert)

Adding a C++ variant is easy

cppFunction(double m(int n double x)

for (int i=0 iltn i++)x = 1 (1+x)

return x

)

(We will learn more about cppFunction() later)

2858

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 6 m(N 1) 100 0170 1000 1 f(N 1) 100 6854 40318 5 k(N 1) 100 7811 45947 4 j(N 1) 100 9183 54018 2 g(N 1) 100 9489 55818 3 h(N 1) 100 11725 68971

2958

Speed Example 2 (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3058

Speed Example 2 (due to StackOverflow)

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3158

Speed Example 2 (due to StackOverflow)

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0030 1000 2 f(15) 100 0335 11167 3 f(20) 100 3517 117233

3258

Speed Example 2 (due to StackOverflow)

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

deployed as

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

3358

Speed Example 2 (due to StackOverflow)

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(20) g(20) order=relative)[14]

test replications elapsed relative 2 g(20) 100 0011 1000 1 f(20) 100 3776 343273

A nice gain of a few orders of magnitude3458

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

3558

What Next

Hands-on

Programming with C++

middot C++ Basicsmiddot Debuggingmiddot Best Practices

and then on to Rcpp itself

3758

C++ Basics

Compiled not Interpreted

Need to compile and link

include ltcstdiogt

int main(void) printf(Hello worldn)return 0

3958

Compiled not Interpreted

Or streams output rather than printf

include ltiostreamgt

int main(void) stdcout ltlt Hello world ltlt stdendlreturn 0

4058

Compiled not Interpreted

g++ -o will compile and link

We will now look at an examples with explicit linking

4158

Compiled not Interpreted

include ltcstdiogt

define MATHLIB_STANDALONEinclude ltRmathhgt

int main(void) printf(N(01) 95th percentile 98fn

qnorm(095 00 10 1 0))

4258

Compiled not Interpreted

We may need to supply

middot header location via -Imiddot library location via -Lmiddot library via -llibraryname

g++ -Iusrinclude -c qnorm_rmathcppg++ -o qnorm_rmath qnorm_rmatho -Lusrlib -lRmath

4358

Statically Typed

middot R is dynamically typed x lt- 314 x lt- foo is validmiddot In C++ each variable must be declared before first usemiddot Common types are int and long (possibly with unsigned)float and double bool as well as char

middot No standard string type though stdstring is closemiddot All these variables types are scalars which is fundamentally

different from R where everything is a vectormiddot class (and struct) allow creation of composite types

classes add behaviour to data to form objectsmiddot Variables need to be declared cannot change

4458

C++ Basics A Better C

C++ is a Better C

middot control structures similar to what R offers for while ifswitch

middot functions are similar too but note the difference inpositional-only matching also same function name butdifferent arguments allowed in C++

middot pointers and memory management very different but lotsof issues people had with C can be avoided via STL (whichis something Rcpp promotes too)

middot sometimes still useful to know what a pointer is hellip

4658

Object-Oriented

This is a second key feature of C++ and it does it differentlyfrom S3 and S4

struct Date unsigned int yearunsigned int monthunsigned int day

struct Person char firstname[20]char lastname[20]struct Date birthdayunsigned long id

4758

Object-Oriented

Object-orientation in the C++ sense matches data with codeoperating on it

class Date private

unsigned int yearunsigned int monthunsigned int date

publicvoid setDate(int y int m int d)int getDay()int getMonth()int getYear()

4858

Generic Programming and the STL

The STL promotes generic programming

For example the sequence container types vector dequeand list all support

middot push_back() to insert at the endmiddot pop_back() to remove from the frontmiddot begin() returning an iterator to the first elementmiddot end() returning an iterator to just after the last elementmiddot size() for the number of elements

but only list has push_front() and pop_front()

Other useful containers set multiset map and multimap4958

Generic Programming and the STL

Traversal of containers can be achieved via iterators whichrequire suitable member functions begin() and end()

stdvectorltdoublegtconst_iterator sifor (si=sbegin() si = send() si++)

stdcout ltlt si ltlt stdendl

5058

Generic Programming and the STL

Another key STL part are algorithms

double sum = accumulate(sbegin() send() 0)

Some other STL algorithms are

middot find finds the first element equal to the supplied valuemiddot count counts the number of matching elementsmiddot transform applies a supplied function to each elementmiddot for_each sweeps over all elements does not altermiddot inner_product inner product of two vectors

5158

Template Programming

Template programming provides a lsquolanguage within C++rsquocode gets evaluated during compilation

One of the simplest template examples is

template lttypename Tgtconst Tamp min(const Tamp x const Tamp y)

return y lt x y x

This can now be used to compute the minimum between twoint variables or double or in fact any admissible typeproviding an operatorlt() for less-than comparison

5258

Template Programming

Another template example is a class squaring its argument

template lttypename Tgtclass square public stdunary_functionltTTgt public

T operator()(T t) const return tt

which can be used along with STL algorithms

transform(xbegin() xend() square)

5358

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 11: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

A Simple Example - Refined

1 2 3 4 5 6

00

01

02

03

04

05

densitydefault(x = xx)

N = 272 Bandwidth = 03348

Den

sity

1058

So Why R

R enables us to

middot work interactivelymiddot explore and visualize datamiddot access retrieve andor generate datamiddot summarize and report into pdf html hellip

making it the key language for statistical computing and apreferred environment for many data analysts

1158

So Why R

R has always been extensible via

middot C via a bare-bones interface described in Writing RExtensions

middot Fortran which is also used internally by Rmiddot Java via rJava by Simon Urbanekmiddot C++ but essentially at the bare-bones level of C

So while in theory this always worked ndash it was tedious inpractice

1258

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written in the Clanguage itrsquos not surprising that the most direct interfaceto non-R software is for code written in C or directlycallable from C All the same including additional C codeis a serious step with some added dangers and often asubstantial amount of programming and debuggingrequired You should have a good reason

1358

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written in the Clanguage itrsquos not surprising that the most direct interfaceto non-R software is for code written in C or directlycallable from C All the same including additional C codeis a serious step with some added dangers and often asubstantial amount of programming and debuggingrequired You should have a good reason

1458

Why Extend R

Chambers proceeds with this rough map of the road ahead

middot Againstmiddot Itrsquos more workmiddot Bugs will bitemiddot Potential platform dependencymiddot Less readable software

middot In Favormiddot New and trusted computationsmiddot Speedmiddot Object references

1558

Why Extend R

The Why boils down to

middot speed Often a good enough reason for us hellip and a focus forus in this workshop

middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R

middot references Chambers quote from 2008 foreshadowed thework on the new Reference Classes now in R and built uponvia Rcpp Modules Rcpp Classes (and also RcppR6)

1658

Overview Why C++

Why C++

middot Asking Google leads to about 52 million hitsmiddot Wikipedia C++ is a statically typed free-form

multi-paradigm compiled general-purpose powerfulprogramming language

middot C++ is industrial-strength vendor-independent widely-usedand still evolving

middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connectto it probably has a CC++ API

middot As a widely used language it also has good tool support(debuggers profilers code analysis)

1858

Why C++

Scott Meyers View C++ as a federation of languages

middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C

middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)

middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages

middot The Standard Template Library (STL) is a specific templatelibrary which is powerful but has its own conventions

middot C++11 (and C++14 and beyond) add enough to be calleda fifth language

NB Meyers original list of four languages appeared years before C++11

1958

Why C++

middot Mature yet currentmiddot Strong performance focus

middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the machine

level and C++

middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly shielded

2058

Overview Vision

Bell Labs May 1976

2258

Interface Vision

R offers us the best of both worlds

middot Compiled code withmiddot Access to proven libraries and algorithms in CC++Fortranmiddot Extremely high performance (in both serial and parallel modes)

middot Interpreted code withmiddot An accessible high-level language made for Programming with

Datamiddot An interactive workflow for data analysismiddot Support for rapid prototyping research and experimentation

2358

Why Rcpp

middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples

middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure

middot Expressive as it allows for vectorised C++ using Rcpp Sugarmiddot Seamless access to all R objects vector matrix list

S3S4RefClass Environment Function hellipmiddot Speed gains for a variety of tasks Rcpp excels precisely

where R struggles loops function calls hellipmiddot Extensions greatly facilitates access to external libraries

using eg Rcpp modules

2458

Overview Speed

Speed Example 1 (due to Christian Robert)

Five different ways to compute 1(1 + x)

f lt- function(n x=1) for(i in 1n) x lt- 1(1+x)g lt- function(n x=1) for(i in 1n) x lt- (1(1+x))h lt- function(n x=1) for(i in 1n) x lt- (1+x)^(-1)j lt- function(n x=1) for(i in 1n) x lt- 11+xk lt- function(n x=1) for(i in 1n) x lt- 11+xlibrary(rbenchmark)N lt- 1e5benchmark(f(N1)g(N1)h(N1)j(N1)k(N1)order=relative)[14]

2658

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 5 k(N 1) 100 6435 1000 1 f(N 1) 100 6609 1027 2 g(N 1) 100 7757 1205 4 j(N 1) 100 7882 1225 3 h(N 1) 100 11766 1828

2758

Speed Example 1 (due to Christian Robert)

Adding a C++ variant is easy

cppFunction(double m(int n double x)

for (int i=0 iltn i++)x = 1 (1+x)

return x

)

(We will learn more about cppFunction() later)

2858

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 6 m(N 1) 100 0170 1000 1 f(N 1) 100 6854 40318 5 k(N 1) 100 7811 45947 4 j(N 1) 100 9183 54018 2 g(N 1) 100 9489 55818 3 h(N 1) 100 11725 68971

2958

Speed Example 2 (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3058

Speed Example 2 (due to StackOverflow)

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3158

Speed Example 2 (due to StackOverflow)

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0030 1000 2 f(15) 100 0335 11167 3 f(20) 100 3517 117233

3258

Speed Example 2 (due to StackOverflow)

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

deployed as

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

3358

Speed Example 2 (due to StackOverflow)

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(20) g(20) order=relative)[14]

test replications elapsed relative 2 g(20) 100 0011 1000 1 f(20) 100 3776 343273

A nice gain of a few orders of magnitude3458

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

3558

What Next

Hands-on

Programming with C++

middot C++ Basicsmiddot Debuggingmiddot Best Practices

and then on to Rcpp itself

3758

C++ Basics

Compiled not Interpreted

Need to compile and link

include ltcstdiogt

int main(void) printf(Hello worldn)return 0

3958

Compiled not Interpreted

Or streams output rather than printf

include ltiostreamgt

int main(void) stdcout ltlt Hello world ltlt stdendlreturn 0

4058

Compiled not Interpreted

g++ -o will compile and link

We will now look at an examples with explicit linking

4158

Compiled not Interpreted

include ltcstdiogt

define MATHLIB_STANDALONEinclude ltRmathhgt

int main(void) printf(N(01) 95th percentile 98fn

qnorm(095 00 10 1 0))

4258

Compiled not Interpreted

We may need to supply

middot header location via -Imiddot library location via -Lmiddot library via -llibraryname

g++ -Iusrinclude -c qnorm_rmathcppg++ -o qnorm_rmath qnorm_rmatho -Lusrlib -lRmath

4358

Statically Typed

middot R is dynamically typed x lt- 314 x lt- foo is validmiddot In C++ each variable must be declared before first usemiddot Common types are int and long (possibly with unsigned)float and double bool as well as char

middot No standard string type though stdstring is closemiddot All these variables types are scalars which is fundamentally

different from R where everything is a vectormiddot class (and struct) allow creation of composite types

classes add behaviour to data to form objectsmiddot Variables need to be declared cannot change

4458

C++ Basics A Better C

C++ is a Better C

middot control structures similar to what R offers for while ifswitch

middot functions are similar too but note the difference inpositional-only matching also same function name butdifferent arguments allowed in C++

middot pointers and memory management very different but lotsof issues people had with C can be avoided via STL (whichis something Rcpp promotes too)

middot sometimes still useful to know what a pointer is hellip

4658

Object-Oriented

This is a second key feature of C++ and it does it differentlyfrom S3 and S4

struct Date unsigned int yearunsigned int monthunsigned int day

struct Person char firstname[20]char lastname[20]struct Date birthdayunsigned long id

4758

Object-Oriented

Object-orientation in the C++ sense matches data with codeoperating on it

class Date private

unsigned int yearunsigned int monthunsigned int date

publicvoid setDate(int y int m int d)int getDay()int getMonth()int getYear()

4858

Generic Programming and the STL

The STL promotes generic programming

For example the sequence container types vector dequeand list all support

middot push_back() to insert at the endmiddot pop_back() to remove from the frontmiddot begin() returning an iterator to the first elementmiddot end() returning an iterator to just after the last elementmiddot size() for the number of elements

but only list has push_front() and pop_front()

Other useful containers set multiset map and multimap4958

Generic Programming and the STL

Traversal of containers can be achieved via iterators whichrequire suitable member functions begin() and end()

stdvectorltdoublegtconst_iterator sifor (si=sbegin() si = send() si++)

stdcout ltlt si ltlt stdendl

5058

Generic Programming and the STL

Another key STL part are algorithms

double sum = accumulate(sbegin() send() 0)

Some other STL algorithms are

middot find finds the first element equal to the supplied valuemiddot count counts the number of matching elementsmiddot transform applies a supplied function to each elementmiddot for_each sweeps over all elements does not altermiddot inner_product inner product of two vectors

5158

Template Programming

Template programming provides a lsquolanguage within C++rsquocode gets evaluated during compilation

One of the simplest template examples is

template lttypename Tgtconst Tamp min(const Tamp x const Tamp y)

return y lt x y x

This can now be used to compute the minimum between twoint variables or double or in fact any admissible typeproviding an operatorlt() for less-than comparison

5258

Template Programming

Another template example is a class squaring its argument

template lttypename Tgtclass square public stdunary_functionltTTgt public

T operator()(T t) const return tt

which can be used along with STL algorithms

transform(xbegin() xend() square)

5358

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 12: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

So Why R

R enables us to

middot work interactivelymiddot explore and visualize datamiddot access retrieve andor generate datamiddot summarize and report into pdf html hellip

making it the key language for statistical computing and apreferred environment for many data analysts

1158

So Why R

R has always been extensible via

middot C via a bare-bones interface described in Writing RExtensions

middot Fortran which is also used internally by Rmiddot Java via rJava by Simon Urbanekmiddot C++ but essentially at the bare-bones level of C

So while in theory this always worked ndash it was tedious inpractice

1258

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written in the Clanguage itrsquos not surprising that the most direct interfaceto non-R software is for code written in C or directlycallable from C All the same including additional C codeis a serious step with some added dangers and often asubstantial amount of programming and debuggingrequired You should have a good reason

1358

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written in the Clanguage itrsquos not surprising that the most direct interfaceto non-R software is for code written in C or directlycallable from C All the same including additional C codeis a serious step with some added dangers and often asubstantial amount of programming and debuggingrequired You should have a good reason

1458

Why Extend R

Chambers proceeds with this rough map of the road ahead

middot Againstmiddot Itrsquos more workmiddot Bugs will bitemiddot Potential platform dependencymiddot Less readable software

middot In Favormiddot New and trusted computationsmiddot Speedmiddot Object references

1558

Why Extend R

The Why boils down to

middot speed Often a good enough reason for us hellip and a focus forus in this workshop

middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R

middot references Chambers quote from 2008 foreshadowed thework on the new Reference Classes now in R and built uponvia Rcpp Modules Rcpp Classes (and also RcppR6)

1658

Overview Why C++

Why C++

middot Asking Google leads to about 52 million hitsmiddot Wikipedia C++ is a statically typed free-form

multi-paradigm compiled general-purpose powerfulprogramming language

middot C++ is industrial-strength vendor-independent widely-usedand still evolving

middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connectto it probably has a CC++ API

middot As a widely used language it also has good tool support(debuggers profilers code analysis)

1858

Why C++

Scott Meyers View C++ as a federation of languages

middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C

middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)

middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages

middot The Standard Template Library (STL) is a specific templatelibrary which is powerful but has its own conventions

middot C++11 (and C++14 and beyond) add enough to be calleda fifth language

NB Meyers original list of four languages appeared years before C++11

1958

Why C++

middot Mature yet currentmiddot Strong performance focus

middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the machine

level and C++

middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly shielded

2058

Overview Vision

Bell Labs May 1976

2258

Interface Vision

R offers us the best of both worlds

middot Compiled code withmiddot Access to proven libraries and algorithms in CC++Fortranmiddot Extremely high performance (in both serial and parallel modes)

middot Interpreted code withmiddot An accessible high-level language made for Programming with

Datamiddot An interactive workflow for data analysismiddot Support for rapid prototyping research and experimentation

2358

Why Rcpp

middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples

middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure

middot Expressive as it allows for vectorised C++ using Rcpp Sugarmiddot Seamless access to all R objects vector matrix list

S3S4RefClass Environment Function hellipmiddot Speed gains for a variety of tasks Rcpp excels precisely

where R struggles loops function calls hellipmiddot Extensions greatly facilitates access to external libraries

using eg Rcpp modules

2458

Overview Speed

Speed Example 1 (due to Christian Robert)

Five different ways to compute 1(1 + x)

f lt- function(n x=1) for(i in 1n) x lt- 1(1+x)g lt- function(n x=1) for(i in 1n) x lt- (1(1+x))h lt- function(n x=1) for(i in 1n) x lt- (1+x)^(-1)j lt- function(n x=1) for(i in 1n) x lt- 11+xk lt- function(n x=1) for(i in 1n) x lt- 11+xlibrary(rbenchmark)N lt- 1e5benchmark(f(N1)g(N1)h(N1)j(N1)k(N1)order=relative)[14]

2658

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 5 k(N 1) 100 6435 1000 1 f(N 1) 100 6609 1027 2 g(N 1) 100 7757 1205 4 j(N 1) 100 7882 1225 3 h(N 1) 100 11766 1828

2758

Speed Example 1 (due to Christian Robert)

Adding a C++ variant is easy

cppFunction(double m(int n double x)

for (int i=0 iltn i++)x = 1 (1+x)

return x

)

(We will learn more about cppFunction() later)

2858

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 6 m(N 1) 100 0170 1000 1 f(N 1) 100 6854 40318 5 k(N 1) 100 7811 45947 4 j(N 1) 100 9183 54018 2 g(N 1) 100 9489 55818 3 h(N 1) 100 11725 68971

2958

Speed Example 2 (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3058

Speed Example 2 (due to StackOverflow)

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3158

Speed Example 2 (due to StackOverflow)

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0030 1000 2 f(15) 100 0335 11167 3 f(20) 100 3517 117233

3258

Speed Example 2 (due to StackOverflow)

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

deployed as

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

3358

Speed Example 2 (due to StackOverflow)

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(20) g(20) order=relative)[14]

test replications elapsed relative 2 g(20) 100 0011 1000 1 f(20) 100 3776 343273

A nice gain of a few orders of magnitude3458

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

3558

What Next

Hands-on

Programming with C++

middot C++ Basicsmiddot Debuggingmiddot Best Practices

and then on to Rcpp itself

3758

C++ Basics

Compiled not Interpreted

Need to compile and link

include ltcstdiogt

int main(void) printf(Hello worldn)return 0

3958

Compiled not Interpreted

Or streams output rather than printf

include ltiostreamgt

int main(void) stdcout ltlt Hello world ltlt stdendlreturn 0

4058

Compiled not Interpreted

g++ -o will compile and link

We will now look at an examples with explicit linking

4158

Compiled not Interpreted

include ltcstdiogt

define MATHLIB_STANDALONEinclude ltRmathhgt

int main(void) printf(N(01) 95th percentile 98fn

qnorm(095 00 10 1 0))

4258

Compiled not Interpreted

We may need to supply

middot header location via -Imiddot library location via -Lmiddot library via -llibraryname

g++ -Iusrinclude -c qnorm_rmathcppg++ -o qnorm_rmath qnorm_rmatho -Lusrlib -lRmath

4358

Statically Typed

middot R is dynamically typed x lt- 314 x lt- foo is validmiddot In C++ each variable must be declared before first usemiddot Common types are int and long (possibly with unsigned)float and double bool as well as char

middot No standard string type though stdstring is closemiddot All these variables types are scalars which is fundamentally

different from R where everything is a vectormiddot class (and struct) allow creation of composite types

classes add behaviour to data to form objectsmiddot Variables need to be declared cannot change

4458

C++ Basics A Better C

C++ is a Better C

middot control structures similar to what R offers for while ifswitch

middot functions are similar too but note the difference inpositional-only matching also same function name butdifferent arguments allowed in C++

middot pointers and memory management very different but lotsof issues people had with C can be avoided via STL (whichis something Rcpp promotes too)

middot sometimes still useful to know what a pointer is hellip

4658

Object-Oriented

This is a second key feature of C++ and it does it differentlyfrom S3 and S4

struct Date unsigned int yearunsigned int monthunsigned int day

struct Person char firstname[20]char lastname[20]struct Date birthdayunsigned long id

4758

Object-Oriented

Object-orientation in the C++ sense matches data with codeoperating on it

class Date private

unsigned int yearunsigned int monthunsigned int date

publicvoid setDate(int y int m int d)int getDay()int getMonth()int getYear()

4858

Generic Programming and the STL

The STL promotes generic programming

For example the sequence container types vector dequeand list all support

middot push_back() to insert at the endmiddot pop_back() to remove from the frontmiddot begin() returning an iterator to the first elementmiddot end() returning an iterator to just after the last elementmiddot size() for the number of elements

but only list has push_front() and pop_front()

Other useful containers set multiset map and multimap4958

Generic Programming and the STL

Traversal of containers can be achieved via iterators whichrequire suitable member functions begin() and end()

stdvectorltdoublegtconst_iterator sifor (si=sbegin() si = send() si++)

stdcout ltlt si ltlt stdendl

5058

Generic Programming and the STL

Another key STL part are algorithms

double sum = accumulate(sbegin() send() 0)

Some other STL algorithms are

middot find finds the first element equal to the supplied valuemiddot count counts the number of matching elementsmiddot transform applies a supplied function to each elementmiddot for_each sweeps over all elements does not altermiddot inner_product inner product of two vectors

5158

Template Programming

Template programming provides a lsquolanguage within C++rsquocode gets evaluated during compilation

One of the simplest template examples is

template lttypename Tgtconst Tamp min(const Tamp x const Tamp y)

return y lt x y x

This can now be used to compute the minimum between twoint variables or double or in fact any admissible typeproviding an operatorlt() for less-than comparison

5258

Template Programming

Another template example is a class squaring its argument

template lttypename Tgtclass square public stdunary_functionltTTgt public

T operator()(T t) const return tt

which can be used along with STL algorithms

transform(xbegin() xend() square)

5358

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 13: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

So Why R

R has always been extensible via

middot C via a bare-bones interface described in Writing RExtensions

middot Fortran which is also used internally by Rmiddot Java via rJava by Simon Urbanekmiddot C++ but essentially at the bare-bones level of C

So while in theory this always worked ndash it was tedious inpractice

1258

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written in the Clanguage itrsquos not surprising that the most direct interfaceto non-R software is for code written in C or directlycallable from C All the same including additional C codeis a serious step with some added dangers and often asubstantial amount of programming and debuggingrequired You should have a good reason

1358

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written in the Clanguage itrsquos not surprising that the most direct interfaceto non-R software is for code written in C or directlycallable from C All the same including additional C codeis a serious step with some added dangers and often asubstantial amount of programming and debuggingrequired You should have a good reason

1458

Why Extend R

Chambers proceeds with this rough map of the road ahead

middot Againstmiddot Itrsquos more workmiddot Bugs will bitemiddot Potential platform dependencymiddot Less readable software

middot In Favormiddot New and trusted computationsmiddot Speedmiddot Object references

1558

Why Extend R

The Why boils down to

middot speed Often a good enough reason for us hellip and a focus forus in this workshop

middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R

middot references Chambers quote from 2008 foreshadowed thework on the new Reference Classes now in R and built uponvia Rcpp Modules Rcpp Classes (and also RcppR6)

1658

Overview Why C++

Why C++

middot Asking Google leads to about 52 million hitsmiddot Wikipedia C++ is a statically typed free-form

multi-paradigm compiled general-purpose powerfulprogramming language

middot C++ is industrial-strength vendor-independent widely-usedand still evolving

middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connectto it probably has a CC++ API

middot As a widely used language it also has good tool support(debuggers profilers code analysis)

1858

Why C++

Scott Meyers View C++ as a federation of languages

middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C

middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)

middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages

middot The Standard Template Library (STL) is a specific templatelibrary which is powerful but has its own conventions

middot C++11 (and C++14 and beyond) add enough to be calleda fifth language

NB Meyers original list of four languages appeared years before C++11

1958

Why C++

middot Mature yet currentmiddot Strong performance focus

middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the machine

level and C++

middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly shielded

2058

Overview Vision

Bell Labs May 1976

2258

Interface Vision

R offers us the best of both worlds

middot Compiled code withmiddot Access to proven libraries and algorithms in CC++Fortranmiddot Extremely high performance (in both serial and parallel modes)

middot Interpreted code withmiddot An accessible high-level language made for Programming with

Datamiddot An interactive workflow for data analysismiddot Support for rapid prototyping research and experimentation

2358

Why Rcpp

middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples

middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure

middot Expressive as it allows for vectorised C++ using Rcpp Sugarmiddot Seamless access to all R objects vector matrix list

S3S4RefClass Environment Function hellipmiddot Speed gains for a variety of tasks Rcpp excels precisely

where R struggles loops function calls hellipmiddot Extensions greatly facilitates access to external libraries

using eg Rcpp modules

2458

Overview Speed

Speed Example 1 (due to Christian Robert)

Five different ways to compute 1(1 + x)

f lt- function(n x=1) for(i in 1n) x lt- 1(1+x)g lt- function(n x=1) for(i in 1n) x lt- (1(1+x))h lt- function(n x=1) for(i in 1n) x lt- (1+x)^(-1)j lt- function(n x=1) for(i in 1n) x lt- 11+xk lt- function(n x=1) for(i in 1n) x lt- 11+xlibrary(rbenchmark)N lt- 1e5benchmark(f(N1)g(N1)h(N1)j(N1)k(N1)order=relative)[14]

2658

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 5 k(N 1) 100 6435 1000 1 f(N 1) 100 6609 1027 2 g(N 1) 100 7757 1205 4 j(N 1) 100 7882 1225 3 h(N 1) 100 11766 1828

2758

Speed Example 1 (due to Christian Robert)

Adding a C++ variant is easy

cppFunction(double m(int n double x)

for (int i=0 iltn i++)x = 1 (1+x)

return x

)

(We will learn more about cppFunction() later)

2858

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 6 m(N 1) 100 0170 1000 1 f(N 1) 100 6854 40318 5 k(N 1) 100 7811 45947 4 j(N 1) 100 9183 54018 2 g(N 1) 100 9489 55818 3 h(N 1) 100 11725 68971

2958

Speed Example 2 (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3058

Speed Example 2 (due to StackOverflow)

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3158

Speed Example 2 (due to StackOverflow)

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0030 1000 2 f(15) 100 0335 11167 3 f(20) 100 3517 117233

3258

Speed Example 2 (due to StackOverflow)

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

deployed as

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

3358

Speed Example 2 (due to StackOverflow)

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(20) g(20) order=relative)[14]

test replications elapsed relative 2 g(20) 100 0011 1000 1 f(20) 100 3776 343273

A nice gain of a few orders of magnitude3458

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

3558

What Next

Hands-on

Programming with C++

middot C++ Basicsmiddot Debuggingmiddot Best Practices

and then on to Rcpp itself

3758

C++ Basics

Compiled not Interpreted

Need to compile and link

include ltcstdiogt

int main(void) printf(Hello worldn)return 0

3958

Compiled not Interpreted

Or streams output rather than printf

include ltiostreamgt

int main(void) stdcout ltlt Hello world ltlt stdendlreturn 0

4058

Compiled not Interpreted

g++ -o will compile and link

We will now look at an examples with explicit linking

4158

Compiled not Interpreted

include ltcstdiogt

define MATHLIB_STANDALONEinclude ltRmathhgt

int main(void) printf(N(01) 95th percentile 98fn

qnorm(095 00 10 1 0))

4258

Compiled not Interpreted

We may need to supply

middot header location via -Imiddot library location via -Lmiddot library via -llibraryname

g++ -Iusrinclude -c qnorm_rmathcppg++ -o qnorm_rmath qnorm_rmatho -Lusrlib -lRmath

4358

Statically Typed

middot R is dynamically typed x lt- 314 x lt- foo is validmiddot In C++ each variable must be declared before first usemiddot Common types are int and long (possibly with unsigned)float and double bool as well as char

middot No standard string type though stdstring is closemiddot All these variables types are scalars which is fundamentally

different from R where everything is a vectormiddot class (and struct) allow creation of composite types

classes add behaviour to data to form objectsmiddot Variables need to be declared cannot change

4458

C++ Basics A Better C

C++ is a Better C

middot control structures similar to what R offers for while ifswitch

middot functions are similar too but note the difference inpositional-only matching also same function name butdifferent arguments allowed in C++

middot pointers and memory management very different but lotsof issues people had with C can be avoided via STL (whichis something Rcpp promotes too)

middot sometimes still useful to know what a pointer is hellip

4658

Object-Oriented

This is a second key feature of C++ and it does it differentlyfrom S3 and S4

struct Date unsigned int yearunsigned int monthunsigned int day

struct Person char firstname[20]char lastname[20]struct Date birthdayunsigned long id

4758

Object-Oriented

Object-orientation in the C++ sense matches data with codeoperating on it

class Date private

unsigned int yearunsigned int monthunsigned int date

publicvoid setDate(int y int m int d)int getDay()int getMonth()int getYear()

4858

Generic Programming and the STL

The STL promotes generic programming

For example the sequence container types vector dequeand list all support

middot push_back() to insert at the endmiddot pop_back() to remove from the frontmiddot begin() returning an iterator to the first elementmiddot end() returning an iterator to just after the last elementmiddot size() for the number of elements

but only list has push_front() and pop_front()

Other useful containers set multiset map and multimap4958

Generic Programming and the STL

Traversal of containers can be achieved via iterators whichrequire suitable member functions begin() and end()

stdvectorltdoublegtconst_iterator sifor (si=sbegin() si = send() si++)

stdcout ltlt si ltlt stdendl

5058

Generic Programming and the STL

Another key STL part are algorithms

double sum = accumulate(sbegin() send() 0)

Some other STL algorithms are

middot find finds the first element equal to the supplied valuemiddot count counts the number of matching elementsmiddot transform applies a supplied function to each elementmiddot for_each sweeps over all elements does not altermiddot inner_product inner product of two vectors

5158

Template Programming

Template programming provides a lsquolanguage within C++rsquocode gets evaluated during compilation

One of the simplest template examples is

template lttypename Tgtconst Tamp min(const Tamp x const Tamp y)

return y lt x y x

This can now be used to compute the minimum between twoint variables or double or in fact any admissible typeproviding an operatorlt() for less-than comparison

5258

Template Programming

Another template example is a class squaring its argument

template lttypename Tgtclass square public stdunary_functionltTTgt public

T operator()(T t) const return tt

which can be used along with STL algorithms

transform(xbegin() xend() square)

5358

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 14: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written in the Clanguage itrsquos not surprising that the most direct interfaceto non-R software is for code written in C or directlycallable from C All the same including additional C codeis a serious step with some added dangers and often asubstantial amount of programming and debuggingrequired You should have a good reason

1358

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written in the Clanguage itrsquos not surprising that the most direct interfaceto non-R software is for code written in C or directlycallable from C All the same including additional C codeis a serious step with some added dangers and often asubstantial amount of programming and debuggingrequired You should have a good reason

1458

Why Extend R

Chambers proceeds with this rough map of the road ahead

middot Againstmiddot Itrsquos more workmiddot Bugs will bitemiddot Potential platform dependencymiddot Less readable software

middot In Favormiddot New and trusted computationsmiddot Speedmiddot Object references

1558

Why Extend R

The Why boils down to

middot speed Often a good enough reason for us hellip and a focus forus in this workshop

middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R

middot references Chambers quote from 2008 foreshadowed thework on the new Reference Classes now in R and built uponvia Rcpp Modules Rcpp Classes (and also RcppR6)

1658

Overview Why C++

Why C++

middot Asking Google leads to about 52 million hitsmiddot Wikipedia C++ is a statically typed free-form

multi-paradigm compiled general-purpose powerfulprogramming language

middot C++ is industrial-strength vendor-independent widely-usedand still evolving

middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connectto it probably has a CC++ API

middot As a widely used language it also has good tool support(debuggers profilers code analysis)

1858

Why C++

Scott Meyers View C++ as a federation of languages

middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C

middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)

middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages

middot The Standard Template Library (STL) is a specific templatelibrary which is powerful but has its own conventions

middot C++11 (and C++14 and beyond) add enough to be calleda fifth language

NB Meyers original list of four languages appeared years before C++11

1958

Why C++

middot Mature yet currentmiddot Strong performance focus

middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the machine

level and C++

middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly shielded

2058

Overview Vision

Bell Labs May 1976

2258

Interface Vision

R offers us the best of both worlds

middot Compiled code withmiddot Access to proven libraries and algorithms in CC++Fortranmiddot Extremely high performance (in both serial and parallel modes)

middot Interpreted code withmiddot An accessible high-level language made for Programming with

Datamiddot An interactive workflow for data analysismiddot Support for rapid prototyping research and experimentation

2358

Why Rcpp

middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples

middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure

middot Expressive as it allows for vectorised C++ using Rcpp Sugarmiddot Seamless access to all R objects vector matrix list

S3S4RefClass Environment Function hellipmiddot Speed gains for a variety of tasks Rcpp excels precisely

where R struggles loops function calls hellipmiddot Extensions greatly facilitates access to external libraries

using eg Rcpp modules

2458

Overview Speed

Speed Example 1 (due to Christian Robert)

Five different ways to compute 1(1 + x)

f lt- function(n x=1) for(i in 1n) x lt- 1(1+x)g lt- function(n x=1) for(i in 1n) x lt- (1(1+x))h lt- function(n x=1) for(i in 1n) x lt- (1+x)^(-1)j lt- function(n x=1) for(i in 1n) x lt- 11+xk lt- function(n x=1) for(i in 1n) x lt- 11+xlibrary(rbenchmark)N lt- 1e5benchmark(f(N1)g(N1)h(N1)j(N1)k(N1)order=relative)[14]

2658

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 5 k(N 1) 100 6435 1000 1 f(N 1) 100 6609 1027 2 g(N 1) 100 7757 1205 4 j(N 1) 100 7882 1225 3 h(N 1) 100 11766 1828

2758

Speed Example 1 (due to Christian Robert)

Adding a C++ variant is easy

cppFunction(double m(int n double x)

for (int i=0 iltn i++)x = 1 (1+x)

return x

)

(We will learn more about cppFunction() later)

2858

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 6 m(N 1) 100 0170 1000 1 f(N 1) 100 6854 40318 5 k(N 1) 100 7811 45947 4 j(N 1) 100 9183 54018 2 g(N 1) 100 9489 55818 3 h(N 1) 100 11725 68971

2958

Speed Example 2 (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3058

Speed Example 2 (due to StackOverflow)

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3158

Speed Example 2 (due to StackOverflow)

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0030 1000 2 f(15) 100 0335 11167 3 f(20) 100 3517 117233

3258

Speed Example 2 (due to StackOverflow)

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

deployed as

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

3358

Speed Example 2 (due to StackOverflow)

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(20) g(20) order=relative)[14]

test replications elapsed relative 2 g(20) 100 0011 1000 1 f(20) 100 3776 343273

A nice gain of a few orders of magnitude3458

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

3558

What Next

Hands-on

Programming with C++

middot C++ Basicsmiddot Debuggingmiddot Best Practices

and then on to Rcpp itself

3758

C++ Basics

Compiled not Interpreted

Need to compile and link

include ltcstdiogt

int main(void) printf(Hello worldn)return 0

3958

Compiled not Interpreted

Or streams output rather than printf

include ltiostreamgt

int main(void) stdcout ltlt Hello world ltlt stdendlreturn 0

4058

Compiled not Interpreted

g++ -o will compile and link

We will now look at an examples with explicit linking

4158

Compiled not Interpreted

include ltcstdiogt

define MATHLIB_STANDALONEinclude ltRmathhgt

int main(void) printf(N(01) 95th percentile 98fn

qnorm(095 00 10 1 0))

4258

Compiled not Interpreted

We may need to supply

middot header location via -Imiddot library location via -Lmiddot library via -llibraryname

g++ -Iusrinclude -c qnorm_rmathcppg++ -o qnorm_rmath qnorm_rmatho -Lusrlib -lRmath

4358

Statically Typed

middot R is dynamically typed x lt- 314 x lt- foo is validmiddot In C++ each variable must be declared before first usemiddot Common types are int and long (possibly with unsigned)float and double bool as well as char

middot No standard string type though stdstring is closemiddot All these variables types are scalars which is fundamentally

different from R where everything is a vectormiddot class (and struct) allow creation of composite types

classes add behaviour to data to form objectsmiddot Variables need to be declared cannot change

4458

C++ Basics A Better C

C++ is a Better C

middot control structures similar to what R offers for while ifswitch

middot functions are similar too but note the difference inpositional-only matching also same function name butdifferent arguments allowed in C++

middot pointers and memory management very different but lotsof issues people had with C can be avoided via STL (whichis something Rcpp promotes too)

middot sometimes still useful to know what a pointer is hellip

4658

Object-Oriented

This is a second key feature of C++ and it does it differentlyfrom S3 and S4

struct Date unsigned int yearunsigned int monthunsigned int day

struct Person char firstname[20]char lastname[20]struct Date birthdayunsigned long id

4758

Object-Oriented

Object-orientation in the C++ sense matches data with codeoperating on it

class Date private

unsigned int yearunsigned int monthunsigned int date

publicvoid setDate(int y int m int d)int getDay()int getMonth()int getYear()

4858

Generic Programming and the STL

The STL promotes generic programming

For example the sequence container types vector dequeand list all support

middot push_back() to insert at the endmiddot pop_back() to remove from the frontmiddot begin() returning an iterator to the first elementmiddot end() returning an iterator to just after the last elementmiddot size() for the number of elements

but only list has push_front() and pop_front()

Other useful containers set multiset map and multimap4958

Generic Programming and the STL

Traversal of containers can be achieved via iterators whichrequire suitable member functions begin() and end()

stdvectorltdoublegtconst_iterator sifor (si=sbegin() si = send() si++)

stdcout ltlt si ltlt stdendl

5058

Generic Programming and the STL

Another key STL part are algorithms

double sum = accumulate(sbegin() send() 0)

Some other STL algorithms are

middot find finds the first element equal to the supplied valuemiddot count counts the number of matching elementsmiddot transform applies a supplied function to each elementmiddot for_each sweeps over all elements does not altermiddot inner_product inner product of two vectors

5158

Template Programming

Template programming provides a lsquolanguage within C++rsquocode gets evaluated during compilation

One of the simplest template examples is

template lttypename Tgtconst Tamp min(const Tamp x const Tamp y)

return y lt x y x

This can now be used to compute the minimum between twoint variables or double or in fact any admissible typeproviding an operatorlt() for less-than comparison

5258

Template Programming

Another template example is a class squaring its argument

template lttypename Tgtclass square public stdunary_functionltTTgt public

T operator()(T t) const return tt

which can be used along with STL algorithms

transform(xbegin() xend() square)

5358

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 15: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

Why Extend R

Chambers (2008) opens Chapter 11 Interfaces I Using C andFortran

Since the core of R is in fact a program written in the Clanguage itrsquos not surprising that the most direct interfaceto non-R software is for code written in C or directlycallable from C All the same including additional C codeis a serious step with some added dangers and often asubstantial amount of programming and debuggingrequired You should have a good reason

1458

Why Extend R

Chambers proceeds with this rough map of the road ahead

middot Againstmiddot Itrsquos more workmiddot Bugs will bitemiddot Potential platform dependencymiddot Less readable software

middot In Favormiddot New and trusted computationsmiddot Speedmiddot Object references

1558

Why Extend R

The Why boils down to

middot speed Often a good enough reason for us hellip and a focus forus in this workshop

middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R

middot references Chambers quote from 2008 foreshadowed thework on the new Reference Classes now in R and built uponvia Rcpp Modules Rcpp Classes (and also RcppR6)

1658

Overview Why C++

Why C++

middot Asking Google leads to about 52 million hitsmiddot Wikipedia C++ is a statically typed free-form

multi-paradigm compiled general-purpose powerfulprogramming language

middot C++ is industrial-strength vendor-independent widely-usedand still evolving

middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connectto it probably has a CC++ API

middot As a widely used language it also has good tool support(debuggers profilers code analysis)

1858

Why C++

Scott Meyers View C++ as a federation of languages

middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C

middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)

middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages

middot The Standard Template Library (STL) is a specific templatelibrary which is powerful but has its own conventions

middot C++11 (and C++14 and beyond) add enough to be calleda fifth language

NB Meyers original list of four languages appeared years before C++11

1958

Why C++

middot Mature yet currentmiddot Strong performance focus

middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the machine

level and C++

middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly shielded

2058

Overview Vision

Bell Labs May 1976

2258

Interface Vision

R offers us the best of both worlds

middot Compiled code withmiddot Access to proven libraries and algorithms in CC++Fortranmiddot Extremely high performance (in both serial and parallel modes)

middot Interpreted code withmiddot An accessible high-level language made for Programming with

Datamiddot An interactive workflow for data analysismiddot Support for rapid prototyping research and experimentation

2358

Why Rcpp

middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples

middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure

middot Expressive as it allows for vectorised C++ using Rcpp Sugarmiddot Seamless access to all R objects vector matrix list

S3S4RefClass Environment Function hellipmiddot Speed gains for a variety of tasks Rcpp excels precisely

where R struggles loops function calls hellipmiddot Extensions greatly facilitates access to external libraries

using eg Rcpp modules

2458

Overview Speed

Speed Example 1 (due to Christian Robert)

Five different ways to compute 1(1 + x)

f lt- function(n x=1) for(i in 1n) x lt- 1(1+x)g lt- function(n x=1) for(i in 1n) x lt- (1(1+x))h lt- function(n x=1) for(i in 1n) x lt- (1+x)^(-1)j lt- function(n x=1) for(i in 1n) x lt- 11+xk lt- function(n x=1) for(i in 1n) x lt- 11+xlibrary(rbenchmark)N lt- 1e5benchmark(f(N1)g(N1)h(N1)j(N1)k(N1)order=relative)[14]

2658

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 5 k(N 1) 100 6435 1000 1 f(N 1) 100 6609 1027 2 g(N 1) 100 7757 1205 4 j(N 1) 100 7882 1225 3 h(N 1) 100 11766 1828

2758

Speed Example 1 (due to Christian Robert)

Adding a C++ variant is easy

cppFunction(double m(int n double x)

for (int i=0 iltn i++)x = 1 (1+x)

return x

)

(We will learn more about cppFunction() later)

2858

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 6 m(N 1) 100 0170 1000 1 f(N 1) 100 6854 40318 5 k(N 1) 100 7811 45947 4 j(N 1) 100 9183 54018 2 g(N 1) 100 9489 55818 3 h(N 1) 100 11725 68971

2958

Speed Example 2 (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3058

Speed Example 2 (due to StackOverflow)

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3158

Speed Example 2 (due to StackOverflow)

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0030 1000 2 f(15) 100 0335 11167 3 f(20) 100 3517 117233

3258

Speed Example 2 (due to StackOverflow)

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

deployed as

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

3358

Speed Example 2 (due to StackOverflow)

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(20) g(20) order=relative)[14]

test replications elapsed relative 2 g(20) 100 0011 1000 1 f(20) 100 3776 343273

A nice gain of a few orders of magnitude3458

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

3558

What Next

Hands-on

Programming with C++

middot C++ Basicsmiddot Debuggingmiddot Best Practices

and then on to Rcpp itself

3758

C++ Basics

Compiled not Interpreted

Need to compile and link

include ltcstdiogt

int main(void) printf(Hello worldn)return 0

3958

Compiled not Interpreted

Or streams output rather than printf

include ltiostreamgt

int main(void) stdcout ltlt Hello world ltlt stdendlreturn 0

4058

Compiled not Interpreted

g++ -o will compile and link

We will now look at an examples with explicit linking

4158

Compiled not Interpreted

include ltcstdiogt

define MATHLIB_STANDALONEinclude ltRmathhgt

int main(void) printf(N(01) 95th percentile 98fn

qnorm(095 00 10 1 0))

4258

Compiled not Interpreted

We may need to supply

middot header location via -Imiddot library location via -Lmiddot library via -llibraryname

g++ -Iusrinclude -c qnorm_rmathcppg++ -o qnorm_rmath qnorm_rmatho -Lusrlib -lRmath

4358

Statically Typed

middot R is dynamically typed x lt- 314 x lt- foo is validmiddot In C++ each variable must be declared before first usemiddot Common types are int and long (possibly with unsigned)float and double bool as well as char

middot No standard string type though stdstring is closemiddot All these variables types are scalars which is fundamentally

different from R where everything is a vectormiddot class (and struct) allow creation of composite types

classes add behaviour to data to form objectsmiddot Variables need to be declared cannot change

4458

C++ Basics A Better C

C++ is a Better C

middot control structures similar to what R offers for while ifswitch

middot functions are similar too but note the difference inpositional-only matching also same function name butdifferent arguments allowed in C++

middot pointers and memory management very different but lotsof issues people had with C can be avoided via STL (whichis something Rcpp promotes too)

middot sometimes still useful to know what a pointer is hellip

4658

Object-Oriented

This is a second key feature of C++ and it does it differentlyfrom S3 and S4

struct Date unsigned int yearunsigned int monthunsigned int day

struct Person char firstname[20]char lastname[20]struct Date birthdayunsigned long id

4758

Object-Oriented

Object-orientation in the C++ sense matches data with codeoperating on it

class Date private

unsigned int yearunsigned int monthunsigned int date

publicvoid setDate(int y int m int d)int getDay()int getMonth()int getYear()

4858

Generic Programming and the STL

The STL promotes generic programming

For example the sequence container types vector dequeand list all support

middot push_back() to insert at the endmiddot pop_back() to remove from the frontmiddot begin() returning an iterator to the first elementmiddot end() returning an iterator to just after the last elementmiddot size() for the number of elements

but only list has push_front() and pop_front()

Other useful containers set multiset map and multimap4958

Generic Programming and the STL

Traversal of containers can be achieved via iterators whichrequire suitable member functions begin() and end()

stdvectorltdoublegtconst_iterator sifor (si=sbegin() si = send() si++)

stdcout ltlt si ltlt stdendl

5058

Generic Programming and the STL

Another key STL part are algorithms

double sum = accumulate(sbegin() send() 0)

Some other STL algorithms are

middot find finds the first element equal to the supplied valuemiddot count counts the number of matching elementsmiddot transform applies a supplied function to each elementmiddot for_each sweeps over all elements does not altermiddot inner_product inner product of two vectors

5158

Template Programming

Template programming provides a lsquolanguage within C++rsquocode gets evaluated during compilation

One of the simplest template examples is

template lttypename Tgtconst Tamp min(const Tamp x const Tamp y)

return y lt x y x

This can now be used to compute the minimum between twoint variables or double or in fact any admissible typeproviding an operatorlt() for less-than comparison

5258

Template Programming

Another template example is a class squaring its argument

template lttypename Tgtclass square public stdunary_functionltTTgt public

T operator()(T t) const return tt

which can be used along with STL algorithms

transform(xbegin() xend() square)

5358

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 16: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

Why Extend R

Chambers proceeds with this rough map of the road ahead

middot Againstmiddot Itrsquos more workmiddot Bugs will bitemiddot Potential platform dependencymiddot Less readable software

middot In Favormiddot New and trusted computationsmiddot Speedmiddot Object references

1558

Why Extend R

The Why boils down to

middot speed Often a good enough reason for us hellip and a focus forus in this workshop

middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R

middot references Chambers quote from 2008 foreshadowed thework on the new Reference Classes now in R and built uponvia Rcpp Modules Rcpp Classes (and also RcppR6)

1658

Overview Why C++

Why C++

middot Asking Google leads to about 52 million hitsmiddot Wikipedia C++ is a statically typed free-form

multi-paradigm compiled general-purpose powerfulprogramming language

middot C++ is industrial-strength vendor-independent widely-usedand still evolving

middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connectto it probably has a CC++ API

middot As a widely used language it also has good tool support(debuggers profilers code analysis)

1858

Why C++

Scott Meyers View C++ as a federation of languages

middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C

middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)

middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages

middot The Standard Template Library (STL) is a specific templatelibrary which is powerful but has its own conventions

middot C++11 (and C++14 and beyond) add enough to be calleda fifth language

NB Meyers original list of four languages appeared years before C++11

1958

Why C++

middot Mature yet currentmiddot Strong performance focus

middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the machine

level and C++

middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly shielded

2058

Overview Vision

Bell Labs May 1976

2258

Interface Vision

R offers us the best of both worlds

middot Compiled code withmiddot Access to proven libraries and algorithms in CC++Fortranmiddot Extremely high performance (in both serial and parallel modes)

middot Interpreted code withmiddot An accessible high-level language made for Programming with

Datamiddot An interactive workflow for data analysismiddot Support for rapid prototyping research and experimentation

2358

Why Rcpp

middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples

middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure

middot Expressive as it allows for vectorised C++ using Rcpp Sugarmiddot Seamless access to all R objects vector matrix list

S3S4RefClass Environment Function hellipmiddot Speed gains for a variety of tasks Rcpp excels precisely

where R struggles loops function calls hellipmiddot Extensions greatly facilitates access to external libraries

using eg Rcpp modules

2458

Overview Speed

Speed Example 1 (due to Christian Robert)

Five different ways to compute 1(1 + x)

f lt- function(n x=1) for(i in 1n) x lt- 1(1+x)g lt- function(n x=1) for(i in 1n) x lt- (1(1+x))h lt- function(n x=1) for(i in 1n) x lt- (1+x)^(-1)j lt- function(n x=1) for(i in 1n) x lt- 11+xk lt- function(n x=1) for(i in 1n) x lt- 11+xlibrary(rbenchmark)N lt- 1e5benchmark(f(N1)g(N1)h(N1)j(N1)k(N1)order=relative)[14]

2658

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 5 k(N 1) 100 6435 1000 1 f(N 1) 100 6609 1027 2 g(N 1) 100 7757 1205 4 j(N 1) 100 7882 1225 3 h(N 1) 100 11766 1828

2758

Speed Example 1 (due to Christian Robert)

Adding a C++ variant is easy

cppFunction(double m(int n double x)

for (int i=0 iltn i++)x = 1 (1+x)

return x

)

(We will learn more about cppFunction() later)

2858

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 6 m(N 1) 100 0170 1000 1 f(N 1) 100 6854 40318 5 k(N 1) 100 7811 45947 4 j(N 1) 100 9183 54018 2 g(N 1) 100 9489 55818 3 h(N 1) 100 11725 68971

2958

Speed Example 2 (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3058

Speed Example 2 (due to StackOverflow)

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3158

Speed Example 2 (due to StackOverflow)

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0030 1000 2 f(15) 100 0335 11167 3 f(20) 100 3517 117233

3258

Speed Example 2 (due to StackOverflow)

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

deployed as

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

3358

Speed Example 2 (due to StackOverflow)

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(20) g(20) order=relative)[14]

test replications elapsed relative 2 g(20) 100 0011 1000 1 f(20) 100 3776 343273

A nice gain of a few orders of magnitude3458

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

3558

What Next

Hands-on

Programming with C++

middot C++ Basicsmiddot Debuggingmiddot Best Practices

and then on to Rcpp itself

3758

C++ Basics

Compiled not Interpreted

Need to compile and link

include ltcstdiogt

int main(void) printf(Hello worldn)return 0

3958

Compiled not Interpreted

Or streams output rather than printf

include ltiostreamgt

int main(void) stdcout ltlt Hello world ltlt stdendlreturn 0

4058

Compiled not Interpreted

g++ -o will compile and link

We will now look at an examples with explicit linking

4158

Compiled not Interpreted

include ltcstdiogt

define MATHLIB_STANDALONEinclude ltRmathhgt

int main(void) printf(N(01) 95th percentile 98fn

qnorm(095 00 10 1 0))

4258

Compiled not Interpreted

We may need to supply

middot header location via -Imiddot library location via -Lmiddot library via -llibraryname

g++ -Iusrinclude -c qnorm_rmathcppg++ -o qnorm_rmath qnorm_rmatho -Lusrlib -lRmath

4358

Statically Typed

middot R is dynamically typed x lt- 314 x lt- foo is validmiddot In C++ each variable must be declared before first usemiddot Common types are int and long (possibly with unsigned)float and double bool as well as char

middot No standard string type though stdstring is closemiddot All these variables types are scalars which is fundamentally

different from R where everything is a vectormiddot class (and struct) allow creation of composite types

classes add behaviour to data to form objectsmiddot Variables need to be declared cannot change

4458

C++ Basics A Better C

C++ is a Better C

middot control structures similar to what R offers for while ifswitch

middot functions are similar too but note the difference inpositional-only matching also same function name butdifferent arguments allowed in C++

middot pointers and memory management very different but lotsof issues people had with C can be avoided via STL (whichis something Rcpp promotes too)

middot sometimes still useful to know what a pointer is hellip

4658

Object-Oriented

This is a second key feature of C++ and it does it differentlyfrom S3 and S4

struct Date unsigned int yearunsigned int monthunsigned int day

struct Person char firstname[20]char lastname[20]struct Date birthdayunsigned long id

4758

Object-Oriented

Object-orientation in the C++ sense matches data with codeoperating on it

class Date private

unsigned int yearunsigned int monthunsigned int date

publicvoid setDate(int y int m int d)int getDay()int getMonth()int getYear()

4858

Generic Programming and the STL

The STL promotes generic programming

For example the sequence container types vector dequeand list all support

middot push_back() to insert at the endmiddot pop_back() to remove from the frontmiddot begin() returning an iterator to the first elementmiddot end() returning an iterator to just after the last elementmiddot size() for the number of elements

but only list has push_front() and pop_front()

Other useful containers set multiset map and multimap4958

Generic Programming and the STL

Traversal of containers can be achieved via iterators whichrequire suitable member functions begin() and end()

stdvectorltdoublegtconst_iterator sifor (si=sbegin() si = send() si++)

stdcout ltlt si ltlt stdendl

5058

Generic Programming and the STL

Another key STL part are algorithms

double sum = accumulate(sbegin() send() 0)

Some other STL algorithms are

middot find finds the first element equal to the supplied valuemiddot count counts the number of matching elementsmiddot transform applies a supplied function to each elementmiddot for_each sweeps over all elements does not altermiddot inner_product inner product of two vectors

5158

Template Programming

Template programming provides a lsquolanguage within C++rsquocode gets evaluated during compilation

One of the simplest template examples is

template lttypename Tgtconst Tamp min(const Tamp x const Tamp y)

return y lt x y x

This can now be used to compute the minimum between twoint variables or double or in fact any admissible typeproviding an operatorlt() for less-than comparison

5258

Template Programming

Another template example is a class squaring its argument

template lttypename Tgtclass square public stdunary_functionltTTgt public

T operator()(T t) const return tt

which can be used along with STL algorithms

transform(xbegin() xend() square)

5358

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 17: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

Why Extend R

The Why boils down to

middot speed Often a good enough reason for us hellip and a focus forus in this workshop

middot new things We can bind to libraries and tools that wouldotherwise be unavailable in R

middot references Chambers quote from 2008 foreshadowed thework on the new Reference Classes now in R and built uponvia Rcpp Modules Rcpp Classes (and also RcppR6)

1658

Overview Why C++

Why C++

middot Asking Google leads to about 52 million hitsmiddot Wikipedia C++ is a statically typed free-form

multi-paradigm compiled general-purpose powerfulprogramming language

middot C++ is industrial-strength vendor-independent widely-usedand still evolving

middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connectto it probably has a CC++ API

middot As a widely used language it also has good tool support(debuggers profilers code analysis)

1858

Why C++

Scott Meyers View C++ as a federation of languages

middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C

middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)

middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages

middot The Standard Template Library (STL) is a specific templatelibrary which is powerful but has its own conventions

middot C++11 (and C++14 and beyond) add enough to be calleda fifth language

NB Meyers original list of four languages appeared years before C++11

1958

Why C++

middot Mature yet currentmiddot Strong performance focus

middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the machine

level and C++

middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly shielded

2058

Overview Vision

Bell Labs May 1976

2258

Interface Vision

R offers us the best of both worlds

middot Compiled code withmiddot Access to proven libraries and algorithms in CC++Fortranmiddot Extremely high performance (in both serial and parallel modes)

middot Interpreted code withmiddot An accessible high-level language made for Programming with

Datamiddot An interactive workflow for data analysismiddot Support for rapid prototyping research and experimentation

2358

Why Rcpp

middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples

middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure

middot Expressive as it allows for vectorised C++ using Rcpp Sugarmiddot Seamless access to all R objects vector matrix list

S3S4RefClass Environment Function hellipmiddot Speed gains for a variety of tasks Rcpp excels precisely

where R struggles loops function calls hellipmiddot Extensions greatly facilitates access to external libraries

using eg Rcpp modules

2458

Overview Speed

Speed Example 1 (due to Christian Robert)

Five different ways to compute 1(1 + x)

f lt- function(n x=1) for(i in 1n) x lt- 1(1+x)g lt- function(n x=1) for(i in 1n) x lt- (1(1+x))h lt- function(n x=1) for(i in 1n) x lt- (1+x)^(-1)j lt- function(n x=1) for(i in 1n) x lt- 11+xk lt- function(n x=1) for(i in 1n) x lt- 11+xlibrary(rbenchmark)N lt- 1e5benchmark(f(N1)g(N1)h(N1)j(N1)k(N1)order=relative)[14]

2658

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 5 k(N 1) 100 6435 1000 1 f(N 1) 100 6609 1027 2 g(N 1) 100 7757 1205 4 j(N 1) 100 7882 1225 3 h(N 1) 100 11766 1828

2758

Speed Example 1 (due to Christian Robert)

Adding a C++ variant is easy

cppFunction(double m(int n double x)

for (int i=0 iltn i++)x = 1 (1+x)

return x

)

(We will learn more about cppFunction() later)

2858

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 6 m(N 1) 100 0170 1000 1 f(N 1) 100 6854 40318 5 k(N 1) 100 7811 45947 4 j(N 1) 100 9183 54018 2 g(N 1) 100 9489 55818 3 h(N 1) 100 11725 68971

2958

Speed Example 2 (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3058

Speed Example 2 (due to StackOverflow)

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3158

Speed Example 2 (due to StackOverflow)

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0030 1000 2 f(15) 100 0335 11167 3 f(20) 100 3517 117233

3258

Speed Example 2 (due to StackOverflow)

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

deployed as

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

3358

Speed Example 2 (due to StackOverflow)

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(20) g(20) order=relative)[14]

test replications elapsed relative 2 g(20) 100 0011 1000 1 f(20) 100 3776 343273

A nice gain of a few orders of magnitude3458

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

3558

What Next

Hands-on

Programming with C++

middot C++ Basicsmiddot Debuggingmiddot Best Practices

and then on to Rcpp itself

3758

C++ Basics

Compiled not Interpreted

Need to compile and link

include ltcstdiogt

int main(void) printf(Hello worldn)return 0

3958

Compiled not Interpreted

Or streams output rather than printf

include ltiostreamgt

int main(void) stdcout ltlt Hello world ltlt stdendlreturn 0

4058

Compiled not Interpreted

g++ -o will compile and link

We will now look at an examples with explicit linking

4158

Compiled not Interpreted

include ltcstdiogt

define MATHLIB_STANDALONEinclude ltRmathhgt

int main(void) printf(N(01) 95th percentile 98fn

qnorm(095 00 10 1 0))

4258

Compiled not Interpreted

We may need to supply

middot header location via -Imiddot library location via -Lmiddot library via -llibraryname

g++ -Iusrinclude -c qnorm_rmathcppg++ -o qnorm_rmath qnorm_rmatho -Lusrlib -lRmath

4358

Statically Typed

middot R is dynamically typed x lt- 314 x lt- foo is validmiddot In C++ each variable must be declared before first usemiddot Common types are int and long (possibly with unsigned)float and double bool as well as char

middot No standard string type though stdstring is closemiddot All these variables types are scalars which is fundamentally

different from R where everything is a vectormiddot class (and struct) allow creation of composite types

classes add behaviour to data to form objectsmiddot Variables need to be declared cannot change

4458

C++ Basics A Better C

C++ is a Better C

middot control structures similar to what R offers for while ifswitch

middot functions are similar too but note the difference inpositional-only matching also same function name butdifferent arguments allowed in C++

middot pointers and memory management very different but lotsof issues people had with C can be avoided via STL (whichis something Rcpp promotes too)

middot sometimes still useful to know what a pointer is hellip

4658

Object-Oriented

This is a second key feature of C++ and it does it differentlyfrom S3 and S4

struct Date unsigned int yearunsigned int monthunsigned int day

struct Person char firstname[20]char lastname[20]struct Date birthdayunsigned long id

4758

Object-Oriented

Object-orientation in the C++ sense matches data with codeoperating on it

class Date private

unsigned int yearunsigned int monthunsigned int date

publicvoid setDate(int y int m int d)int getDay()int getMonth()int getYear()

4858

Generic Programming and the STL

The STL promotes generic programming

For example the sequence container types vector dequeand list all support

middot push_back() to insert at the endmiddot pop_back() to remove from the frontmiddot begin() returning an iterator to the first elementmiddot end() returning an iterator to just after the last elementmiddot size() for the number of elements

but only list has push_front() and pop_front()

Other useful containers set multiset map and multimap4958

Generic Programming and the STL

Traversal of containers can be achieved via iterators whichrequire suitable member functions begin() and end()

stdvectorltdoublegtconst_iterator sifor (si=sbegin() si = send() si++)

stdcout ltlt si ltlt stdendl

5058

Generic Programming and the STL

Another key STL part are algorithms

double sum = accumulate(sbegin() send() 0)

Some other STL algorithms are

middot find finds the first element equal to the supplied valuemiddot count counts the number of matching elementsmiddot transform applies a supplied function to each elementmiddot for_each sweeps over all elements does not altermiddot inner_product inner product of two vectors

5158

Template Programming

Template programming provides a lsquolanguage within C++rsquocode gets evaluated during compilation

One of the simplest template examples is

template lttypename Tgtconst Tamp min(const Tamp x const Tamp y)

return y lt x y x

This can now be used to compute the minimum between twoint variables or double or in fact any admissible typeproviding an operatorlt() for less-than comparison

5258

Template Programming

Another template example is a class squaring its argument

template lttypename Tgtclass square public stdunary_functionltTTgt public

T operator()(T t) const return tt

which can be used along with STL algorithms

transform(xbegin() xend() square)

5358

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 18: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

Overview Why C++

Why C++

middot Asking Google leads to about 52 million hitsmiddot Wikipedia C++ is a statically typed free-form

multi-paradigm compiled general-purpose powerfulprogramming language

middot C++ is industrial-strength vendor-independent widely-usedand still evolving

middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connectto it probably has a CC++ API

middot As a widely used language it also has good tool support(debuggers profilers code analysis)

1858

Why C++

Scott Meyers View C++ as a federation of languages

middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C

middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)

middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages

middot The Standard Template Library (STL) is a specific templatelibrary which is powerful but has its own conventions

middot C++11 (and C++14 and beyond) add enough to be calleda fifth language

NB Meyers original list of four languages appeared years before C++11

1958

Why C++

middot Mature yet currentmiddot Strong performance focus

middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the machine

level and C++

middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly shielded

2058

Overview Vision

Bell Labs May 1976

2258

Interface Vision

R offers us the best of both worlds

middot Compiled code withmiddot Access to proven libraries and algorithms in CC++Fortranmiddot Extremely high performance (in both serial and parallel modes)

middot Interpreted code withmiddot An accessible high-level language made for Programming with

Datamiddot An interactive workflow for data analysismiddot Support for rapid prototyping research and experimentation

2358

Why Rcpp

middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples

middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure

middot Expressive as it allows for vectorised C++ using Rcpp Sugarmiddot Seamless access to all R objects vector matrix list

S3S4RefClass Environment Function hellipmiddot Speed gains for a variety of tasks Rcpp excels precisely

where R struggles loops function calls hellipmiddot Extensions greatly facilitates access to external libraries

using eg Rcpp modules

2458

Overview Speed

Speed Example 1 (due to Christian Robert)

Five different ways to compute 1(1 + x)

f lt- function(n x=1) for(i in 1n) x lt- 1(1+x)g lt- function(n x=1) for(i in 1n) x lt- (1(1+x))h lt- function(n x=1) for(i in 1n) x lt- (1+x)^(-1)j lt- function(n x=1) for(i in 1n) x lt- 11+xk lt- function(n x=1) for(i in 1n) x lt- 11+xlibrary(rbenchmark)N lt- 1e5benchmark(f(N1)g(N1)h(N1)j(N1)k(N1)order=relative)[14]

2658

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 5 k(N 1) 100 6435 1000 1 f(N 1) 100 6609 1027 2 g(N 1) 100 7757 1205 4 j(N 1) 100 7882 1225 3 h(N 1) 100 11766 1828

2758

Speed Example 1 (due to Christian Robert)

Adding a C++ variant is easy

cppFunction(double m(int n double x)

for (int i=0 iltn i++)x = 1 (1+x)

return x

)

(We will learn more about cppFunction() later)

2858

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 6 m(N 1) 100 0170 1000 1 f(N 1) 100 6854 40318 5 k(N 1) 100 7811 45947 4 j(N 1) 100 9183 54018 2 g(N 1) 100 9489 55818 3 h(N 1) 100 11725 68971

2958

Speed Example 2 (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3058

Speed Example 2 (due to StackOverflow)

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3158

Speed Example 2 (due to StackOverflow)

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0030 1000 2 f(15) 100 0335 11167 3 f(20) 100 3517 117233

3258

Speed Example 2 (due to StackOverflow)

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

deployed as

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

3358

Speed Example 2 (due to StackOverflow)

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(20) g(20) order=relative)[14]

test replications elapsed relative 2 g(20) 100 0011 1000 1 f(20) 100 3776 343273

A nice gain of a few orders of magnitude3458

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

3558

What Next

Hands-on

Programming with C++

middot C++ Basicsmiddot Debuggingmiddot Best Practices

and then on to Rcpp itself

3758

C++ Basics

Compiled not Interpreted

Need to compile and link

include ltcstdiogt

int main(void) printf(Hello worldn)return 0

3958

Compiled not Interpreted

Or streams output rather than printf

include ltiostreamgt

int main(void) stdcout ltlt Hello world ltlt stdendlreturn 0

4058

Compiled not Interpreted

g++ -o will compile and link

We will now look at an examples with explicit linking

4158

Compiled not Interpreted

include ltcstdiogt

define MATHLIB_STANDALONEinclude ltRmathhgt

int main(void) printf(N(01) 95th percentile 98fn

qnorm(095 00 10 1 0))

4258

Compiled not Interpreted

We may need to supply

middot header location via -Imiddot library location via -Lmiddot library via -llibraryname

g++ -Iusrinclude -c qnorm_rmathcppg++ -o qnorm_rmath qnorm_rmatho -Lusrlib -lRmath

4358

Statically Typed

middot R is dynamically typed x lt- 314 x lt- foo is validmiddot In C++ each variable must be declared before first usemiddot Common types are int and long (possibly with unsigned)float and double bool as well as char

middot No standard string type though stdstring is closemiddot All these variables types are scalars which is fundamentally

different from R where everything is a vectormiddot class (and struct) allow creation of composite types

classes add behaviour to data to form objectsmiddot Variables need to be declared cannot change

4458

C++ Basics A Better C

C++ is a Better C

middot control structures similar to what R offers for while ifswitch

middot functions are similar too but note the difference inpositional-only matching also same function name butdifferent arguments allowed in C++

middot pointers and memory management very different but lotsof issues people had with C can be avoided via STL (whichis something Rcpp promotes too)

middot sometimes still useful to know what a pointer is hellip

4658

Object-Oriented

This is a second key feature of C++ and it does it differentlyfrom S3 and S4

struct Date unsigned int yearunsigned int monthunsigned int day

struct Person char firstname[20]char lastname[20]struct Date birthdayunsigned long id

4758

Object-Oriented

Object-orientation in the C++ sense matches data with codeoperating on it

class Date private

unsigned int yearunsigned int monthunsigned int date

publicvoid setDate(int y int m int d)int getDay()int getMonth()int getYear()

4858

Generic Programming and the STL

The STL promotes generic programming

For example the sequence container types vector dequeand list all support

middot push_back() to insert at the endmiddot pop_back() to remove from the frontmiddot begin() returning an iterator to the first elementmiddot end() returning an iterator to just after the last elementmiddot size() for the number of elements

but only list has push_front() and pop_front()

Other useful containers set multiset map and multimap4958

Generic Programming and the STL

Traversal of containers can be achieved via iterators whichrequire suitable member functions begin() and end()

stdvectorltdoublegtconst_iterator sifor (si=sbegin() si = send() si++)

stdcout ltlt si ltlt stdendl

5058

Generic Programming and the STL

Another key STL part are algorithms

double sum = accumulate(sbegin() send() 0)

Some other STL algorithms are

middot find finds the first element equal to the supplied valuemiddot count counts the number of matching elementsmiddot transform applies a supplied function to each elementmiddot for_each sweeps over all elements does not altermiddot inner_product inner product of two vectors

5158

Template Programming

Template programming provides a lsquolanguage within C++rsquocode gets evaluated during compilation

One of the simplest template examples is

template lttypename Tgtconst Tamp min(const Tamp x const Tamp y)

return y lt x y x

This can now be used to compute the minimum between twoint variables or double or in fact any admissible typeproviding an operatorlt() for less-than comparison

5258

Template Programming

Another template example is a class squaring its argument

template lttypename Tgtclass square public stdunary_functionltTTgt public

T operator()(T t) const return tt

which can be used along with STL algorithms

transform(xbegin() xend() square)

5358

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 19: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

Why C++

middot Asking Google leads to about 52 million hitsmiddot Wikipedia C++ is a statically typed free-form

multi-paradigm compiled general-purpose powerfulprogramming language

middot C++ is industrial-strength vendor-independent widely-usedand still evolving

middot In science amp research one of the most frequently-usedlanguages If there is something you want to use connectto it probably has a CC++ API

middot As a widely used language it also has good tool support(debuggers profilers code analysis)

1858

Why C++

Scott Meyers View C++ as a federation of languages

middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C

middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)

middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages

middot The Standard Template Library (STL) is a specific templatelibrary which is powerful but has its own conventions

middot C++11 (and C++14 and beyond) add enough to be calleda fifth language

NB Meyers original list of four languages appeared years before C++11

1958

Why C++

middot Mature yet currentmiddot Strong performance focus

middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the machine

level and C++

middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly shielded

2058

Overview Vision

Bell Labs May 1976

2258

Interface Vision

R offers us the best of both worlds

middot Compiled code withmiddot Access to proven libraries and algorithms in CC++Fortranmiddot Extremely high performance (in both serial and parallel modes)

middot Interpreted code withmiddot An accessible high-level language made for Programming with

Datamiddot An interactive workflow for data analysismiddot Support for rapid prototyping research and experimentation

2358

Why Rcpp

middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples

middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure

middot Expressive as it allows for vectorised C++ using Rcpp Sugarmiddot Seamless access to all R objects vector matrix list

S3S4RefClass Environment Function hellipmiddot Speed gains for a variety of tasks Rcpp excels precisely

where R struggles loops function calls hellipmiddot Extensions greatly facilitates access to external libraries

using eg Rcpp modules

2458

Overview Speed

Speed Example 1 (due to Christian Robert)

Five different ways to compute 1(1 + x)

f lt- function(n x=1) for(i in 1n) x lt- 1(1+x)g lt- function(n x=1) for(i in 1n) x lt- (1(1+x))h lt- function(n x=1) for(i in 1n) x lt- (1+x)^(-1)j lt- function(n x=1) for(i in 1n) x lt- 11+xk lt- function(n x=1) for(i in 1n) x lt- 11+xlibrary(rbenchmark)N lt- 1e5benchmark(f(N1)g(N1)h(N1)j(N1)k(N1)order=relative)[14]

2658

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 5 k(N 1) 100 6435 1000 1 f(N 1) 100 6609 1027 2 g(N 1) 100 7757 1205 4 j(N 1) 100 7882 1225 3 h(N 1) 100 11766 1828

2758

Speed Example 1 (due to Christian Robert)

Adding a C++ variant is easy

cppFunction(double m(int n double x)

for (int i=0 iltn i++)x = 1 (1+x)

return x

)

(We will learn more about cppFunction() later)

2858

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 6 m(N 1) 100 0170 1000 1 f(N 1) 100 6854 40318 5 k(N 1) 100 7811 45947 4 j(N 1) 100 9183 54018 2 g(N 1) 100 9489 55818 3 h(N 1) 100 11725 68971

2958

Speed Example 2 (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3058

Speed Example 2 (due to StackOverflow)

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3158

Speed Example 2 (due to StackOverflow)

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0030 1000 2 f(15) 100 0335 11167 3 f(20) 100 3517 117233

3258

Speed Example 2 (due to StackOverflow)

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

deployed as

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

3358

Speed Example 2 (due to StackOverflow)

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(20) g(20) order=relative)[14]

test replications elapsed relative 2 g(20) 100 0011 1000 1 f(20) 100 3776 343273

A nice gain of a few orders of magnitude3458

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

3558

What Next

Hands-on

Programming with C++

middot C++ Basicsmiddot Debuggingmiddot Best Practices

and then on to Rcpp itself

3758

C++ Basics

Compiled not Interpreted

Need to compile and link

include ltcstdiogt

int main(void) printf(Hello worldn)return 0

3958

Compiled not Interpreted

Or streams output rather than printf

include ltiostreamgt

int main(void) stdcout ltlt Hello world ltlt stdendlreturn 0

4058

Compiled not Interpreted

g++ -o will compile and link

We will now look at an examples with explicit linking

4158

Compiled not Interpreted

include ltcstdiogt

define MATHLIB_STANDALONEinclude ltRmathhgt

int main(void) printf(N(01) 95th percentile 98fn

qnorm(095 00 10 1 0))

4258

Compiled not Interpreted

We may need to supply

middot header location via -Imiddot library location via -Lmiddot library via -llibraryname

g++ -Iusrinclude -c qnorm_rmathcppg++ -o qnorm_rmath qnorm_rmatho -Lusrlib -lRmath

4358

Statically Typed

middot R is dynamically typed x lt- 314 x lt- foo is validmiddot In C++ each variable must be declared before first usemiddot Common types are int and long (possibly with unsigned)float and double bool as well as char

middot No standard string type though stdstring is closemiddot All these variables types are scalars which is fundamentally

different from R where everything is a vectormiddot class (and struct) allow creation of composite types

classes add behaviour to data to form objectsmiddot Variables need to be declared cannot change

4458

C++ Basics A Better C

C++ is a Better C

middot control structures similar to what R offers for while ifswitch

middot functions are similar too but note the difference inpositional-only matching also same function name butdifferent arguments allowed in C++

middot pointers and memory management very different but lotsof issues people had with C can be avoided via STL (whichis something Rcpp promotes too)

middot sometimes still useful to know what a pointer is hellip

4658

Object-Oriented

This is a second key feature of C++ and it does it differentlyfrom S3 and S4

struct Date unsigned int yearunsigned int monthunsigned int day

struct Person char firstname[20]char lastname[20]struct Date birthdayunsigned long id

4758

Object-Oriented

Object-orientation in the C++ sense matches data with codeoperating on it

class Date private

unsigned int yearunsigned int monthunsigned int date

publicvoid setDate(int y int m int d)int getDay()int getMonth()int getYear()

4858

Generic Programming and the STL

The STL promotes generic programming

For example the sequence container types vector dequeand list all support

middot push_back() to insert at the endmiddot pop_back() to remove from the frontmiddot begin() returning an iterator to the first elementmiddot end() returning an iterator to just after the last elementmiddot size() for the number of elements

but only list has push_front() and pop_front()

Other useful containers set multiset map and multimap4958

Generic Programming and the STL

Traversal of containers can be achieved via iterators whichrequire suitable member functions begin() and end()

stdvectorltdoublegtconst_iterator sifor (si=sbegin() si = send() si++)

stdcout ltlt si ltlt stdendl

5058

Generic Programming and the STL

Another key STL part are algorithms

double sum = accumulate(sbegin() send() 0)

Some other STL algorithms are

middot find finds the first element equal to the supplied valuemiddot count counts the number of matching elementsmiddot transform applies a supplied function to each elementmiddot for_each sweeps over all elements does not altermiddot inner_product inner product of two vectors

5158

Template Programming

Template programming provides a lsquolanguage within C++rsquocode gets evaluated during compilation

One of the simplest template examples is

template lttypename Tgtconst Tamp min(const Tamp x const Tamp y)

return y lt x y x

This can now be used to compute the minimum between twoint variables or double or in fact any admissible typeproviding an operatorlt() for less-than comparison

5258

Template Programming

Another template example is a class squaring its argument

template lttypename Tgtclass square public stdunary_functionltTTgt public

T operator()(T t) const return tt

which can be used along with STL algorithms

transform(xbegin() xend() square)

5358

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 20: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

Why C++

Scott Meyers View C++ as a federation of languages

middot C provides a rich inheritance and interoperability as UnixWindows hellip are all build on C

middot Object-Oriented C++ (maybe just to provide endlessdiscussions about exactly what OO is or should be)

middot Templated C++ which is mighty powerful template metaprogramming unequalled in other languages

middot The Standard Template Library (STL) is a specific templatelibrary which is powerful but has its own conventions

middot C++11 (and C++14 and beyond) add enough to be calleda fifth language

NB Meyers original list of four languages appeared years before C++11

1958

Why C++

middot Mature yet currentmiddot Strong performance focus

middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the machine

level and C++

middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly shielded

2058

Overview Vision

Bell Labs May 1976

2258

Interface Vision

R offers us the best of both worlds

middot Compiled code withmiddot Access to proven libraries and algorithms in CC++Fortranmiddot Extremely high performance (in both serial and parallel modes)

middot Interpreted code withmiddot An accessible high-level language made for Programming with

Datamiddot An interactive workflow for data analysismiddot Support for rapid prototyping research and experimentation

2358

Why Rcpp

middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples

middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure

middot Expressive as it allows for vectorised C++ using Rcpp Sugarmiddot Seamless access to all R objects vector matrix list

S3S4RefClass Environment Function hellipmiddot Speed gains for a variety of tasks Rcpp excels precisely

where R struggles loops function calls hellipmiddot Extensions greatly facilitates access to external libraries

using eg Rcpp modules

2458

Overview Speed

Speed Example 1 (due to Christian Robert)

Five different ways to compute 1(1 + x)

f lt- function(n x=1) for(i in 1n) x lt- 1(1+x)g lt- function(n x=1) for(i in 1n) x lt- (1(1+x))h lt- function(n x=1) for(i in 1n) x lt- (1+x)^(-1)j lt- function(n x=1) for(i in 1n) x lt- 11+xk lt- function(n x=1) for(i in 1n) x lt- 11+xlibrary(rbenchmark)N lt- 1e5benchmark(f(N1)g(N1)h(N1)j(N1)k(N1)order=relative)[14]

2658

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 5 k(N 1) 100 6435 1000 1 f(N 1) 100 6609 1027 2 g(N 1) 100 7757 1205 4 j(N 1) 100 7882 1225 3 h(N 1) 100 11766 1828

2758

Speed Example 1 (due to Christian Robert)

Adding a C++ variant is easy

cppFunction(double m(int n double x)

for (int i=0 iltn i++)x = 1 (1+x)

return x

)

(We will learn more about cppFunction() later)

2858

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 6 m(N 1) 100 0170 1000 1 f(N 1) 100 6854 40318 5 k(N 1) 100 7811 45947 4 j(N 1) 100 9183 54018 2 g(N 1) 100 9489 55818 3 h(N 1) 100 11725 68971

2958

Speed Example 2 (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3058

Speed Example 2 (due to StackOverflow)

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3158

Speed Example 2 (due to StackOverflow)

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0030 1000 2 f(15) 100 0335 11167 3 f(20) 100 3517 117233

3258

Speed Example 2 (due to StackOverflow)

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

deployed as

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

3358

Speed Example 2 (due to StackOverflow)

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(20) g(20) order=relative)[14]

test replications elapsed relative 2 g(20) 100 0011 1000 1 f(20) 100 3776 343273

A nice gain of a few orders of magnitude3458

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

3558

What Next

Hands-on

Programming with C++

middot C++ Basicsmiddot Debuggingmiddot Best Practices

and then on to Rcpp itself

3758

C++ Basics

Compiled not Interpreted

Need to compile and link

include ltcstdiogt

int main(void) printf(Hello worldn)return 0

3958

Compiled not Interpreted

Or streams output rather than printf

include ltiostreamgt

int main(void) stdcout ltlt Hello world ltlt stdendlreturn 0

4058

Compiled not Interpreted

g++ -o will compile and link

We will now look at an examples with explicit linking

4158

Compiled not Interpreted

include ltcstdiogt

define MATHLIB_STANDALONEinclude ltRmathhgt

int main(void) printf(N(01) 95th percentile 98fn

qnorm(095 00 10 1 0))

4258

Compiled not Interpreted

We may need to supply

middot header location via -Imiddot library location via -Lmiddot library via -llibraryname

g++ -Iusrinclude -c qnorm_rmathcppg++ -o qnorm_rmath qnorm_rmatho -Lusrlib -lRmath

4358

Statically Typed

middot R is dynamically typed x lt- 314 x lt- foo is validmiddot In C++ each variable must be declared before first usemiddot Common types are int and long (possibly with unsigned)float and double bool as well as char

middot No standard string type though stdstring is closemiddot All these variables types are scalars which is fundamentally

different from R where everything is a vectormiddot class (and struct) allow creation of composite types

classes add behaviour to data to form objectsmiddot Variables need to be declared cannot change

4458

C++ Basics A Better C

C++ is a Better C

middot control structures similar to what R offers for while ifswitch

middot functions are similar too but note the difference inpositional-only matching also same function name butdifferent arguments allowed in C++

middot pointers and memory management very different but lotsof issues people had with C can be avoided via STL (whichis something Rcpp promotes too)

middot sometimes still useful to know what a pointer is hellip

4658

Object-Oriented

This is a second key feature of C++ and it does it differentlyfrom S3 and S4

struct Date unsigned int yearunsigned int monthunsigned int day

struct Person char firstname[20]char lastname[20]struct Date birthdayunsigned long id

4758

Object-Oriented

Object-orientation in the C++ sense matches data with codeoperating on it

class Date private

unsigned int yearunsigned int monthunsigned int date

publicvoid setDate(int y int m int d)int getDay()int getMonth()int getYear()

4858

Generic Programming and the STL

The STL promotes generic programming

For example the sequence container types vector dequeand list all support

middot push_back() to insert at the endmiddot pop_back() to remove from the frontmiddot begin() returning an iterator to the first elementmiddot end() returning an iterator to just after the last elementmiddot size() for the number of elements

but only list has push_front() and pop_front()

Other useful containers set multiset map and multimap4958

Generic Programming and the STL

Traversal of containers can be achieved via iterators whichrequire suitable member functions begin() and end()

stdvectorltdoublegtconst_iterator sifor (si=sbegin() si = send() si++)

stdcout ltlt si ltlt stdendl

5058

Generic Programming and the STL

Another key STL part are algorithms

double sum = accumulate(sbegin() send() 0)

Some other STL algorithms are

middot find finds the first element equal to the supplied valuemiddot count counts the number of matching elementsmiddot transform applies a supplied function to each elementmiddot for_each sweeps over all elements does not altermiddot inner_product inner product of two vectors

5158

Template Programming

Template programming provides a lsquolanguage within C++rsquocode gets evaluated during compilation

One of the simplest template examples is

template lttypename Tgtconst Tamp min(const Tamp x const Tamp y)

return y lt x y x

This can now be used to compute the minimum between twoint variables or double or in fact any admissible typeproviding an operatorlt() for less-than comparison

5258

Template Programming

Another template example is a class squaring its argument

template lttypename Tgtclass square public stdunary_functionltTTgt public

T operator()(T t) const return tt

which can be used along with STL algorithms

transform(xbegin() xend() square)

5358

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 21: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

Why C++

middot Mature yet currentmiddot Strong performance focus

middot You donrsquot pay for what you donrsquot usemiddot Leave no room for another language between the machine

level and C++

middot Yet also powerfully abstract and high-levelmiddot C++11 is a big deal giving us new language featuresmiddot While there are complexities Rcpp users are mostly shielded

2058

Overview Vision

Bell Labs May 1976

2258

Interface Vision

R offers us the best of both worlds

middot Compiled code withmiddot Access to proven libraries and algorithms in CC++Fortranmiddot Extremely high performance (in both serial and parallel modes)

middot Interpreted code withmiddot An accessible high-level language made for Programming with

Datamiddot An interactive workflow for data analysismiddot Support for rapid prototyping research and experimentation

2358

Why Rcpp

middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples

middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure

middot Expressive as it allows for vectorised C++ using Rcpp Sugarmiddot Seamless access to all R objects vector matrix list

S3S4RefClass Environment Function hellipmiddot Speed gains for a variety of tasks Rcpp excels precisely

where R struggles loops function calls hellipmiddot Extensions greatly facilitates access to external libraries

using eg Rcpp modules

2458

Overview Speed

Speed Example 1 (due to Christian Robert)

Five different ways to compute 1(1 + x)

f lt- function(n x=1) for(i in 1n) x lt- 1(1+x)g lt- function(n x=1) for(i in 1n) x lt- (1(1+x))h lt- function(n x=1) for(i in 1n) x lt- (1+x)^(-1)j lt- function(n x=1) for(i in 1n) x lt- 11+xk lt- function(n x=1) for(i in 1n) x lt- 11+xlibrary(rbenchmark)N lt- 1e5benchmark(f(N1)g(N1)h(N1)j(N1)k(N1)order=relative)[14]

2658

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 5 k(N 1) 100 6435 1000 1 f(N 1) 100 6609 1027 2 g(N 1) 100 7757 1205 4 j(N 1) 100 7882 1225 3 h(N 1) 100 11766 1828

2758

Speed Example 1 (due to Christian Robert)

Adding a C++ variant is easy

cppFunction(double m(int n double x)

for (int i=0 iltn i++)x = 1 (1+x)

return x

)

(We will learn more about cppFunction() later)

2858

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 6 m(N 1) 100 0170 1000 1 f(N 1) 100 6854 40318 5 k(N 1) 100 7811 45947 4 j(N 1) 100 9183 54018 2 g(N 1) 100 9489 55818 3 h(N 1) 100 11725 68971

2958

Speed Example 2 (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3058

Speed Example 2 (due to StackOverflow)

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3158

Speed Example 2 (due to StackOverflow)

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0030 1000 2 f(15) 100 0335 11167 3 f(20) 100 3517 117233

3258

Speed Example 2 (due to StackOverflow)

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

deployed as

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

3358

Speed Example 2 (due to StackOverflow)

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(20) g(20) order=relative)[14]

test replications elapsed relative 2 g(20) 100 0011 1000 1 f(20) 100 3776 343273

A nice gain of a few orders of magnitude3458

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

3558

What Next

Hands-on

Programming with C++

middot C++ Basicsmiddot Debuggingmiddot Best Practices

and then on to Rcpp itself

3758

C++ Basics

Compiled not Interpreted

Need to compile and link

include ltcstdiogt

int main(void) printf(Hello worldn)return 0

3958

Compiled not Interpreted

Or streams output rather than printf

include ltiostreamgt

int main(void) stdcout ltlt Hello world ltlt stdendlreturn 0

4058

Compiled not Interpreted

g++ -o will compile and link

We will now look at an examples with explicit linking

4158

Compiled not Interpreted

include ltcstdiogt

define MATHLIB_STANDALONEinclude ltRmathhgt

int main(void) printf(N(01) 95th percentile 98fn

qnorm(095 00 10 1 0))

4258

Compiled not Interpreted

We may need to supply

middot header location via -Imiddot library location via -Lmiddot library via -llibraryname

g++ -Iusrinclude -c qnorm_rmathcppg++ -o qnorm_rmath qnorm_rmatho -Lusrlib -lRmath

4358

Statically Typed

middot R is dynamically typed x lt- 314 x lt- foo is validmiddot In C++ each variable must be declared before first usemiddot Common types are int and long (possibly with unsigned)float and double bool as well as char

middot No standard string type though stdstring is closemiddot All these variables types are scalars which is fundamentally

different from R where everything is a vectormiddot class (and struct) allow creation of composite types

classes add behaviour to data to form objectsmiddot Variables need to be declared cannot change

4458

C++ Basics A Better C

C++ is a Better C

middot control structures similar to what R offers for while ifswitch

middot functions are similar too but note the difference inpositional-only matching also same function name butdifferent arguments allowed in C++

middot pointers and memory management very different but lotsof issues people had with C can be avoided via STL (whichis something Rcpp promotes too)

middot sometimes still useful to know what a pointer is hellip

4658

Object-Oriented

This is a second key feature of C++ and it does it differentlyfrom S3 and S4

struct Date unsigned int yearunsigned int monthunsigned int day

struct Person char firstname[20]char lastname[20]struct Date birthdayunsigned long id

4758

Object-Oriented

Object-orientation in the C++ sense matches data with codeoperating on it

class Date private

unsigned int yearunsigned int monthunsigned int date

publicvoid setDate(int y int m int d)int getDay()int getMonth()int getYear()

4858

Generic Programming and the STL

The STL promotes generic programming

For example the sequence container types vector dequeand list all support

middot push_back() to insert at the endmiddot pop_back() to remove from the frontmiddot begin() returning an iterator to the first elementmiddot end() returning an iterator to just after the last elementmiddot size() for the number of elements

but only list has push_front() and pop_front()

Other useful containers set multiset map and multimap4958

Generic Programming and the STL

Traversal of containers can be achieved via iterators whichrequire suitable member functions begin() and end()

stdvectorltdoublegtconst_iterator sifor (si=sbegin() si = send() si++)

stdcout ltlt si ltlt stdendl

5058

Generic Programming and the STL

Another key STL part are algorithms

double sum = accumulate(sbegin() send() 0)

Some other STL algorithms are

middot find finds the first element equal to the supplied valuemiddot count counts the number of matching elementsmiddot transform applies a supplied function to each elementmiddot for_each sweeps over all elements does not altermiddot inner_product inner product of two vectors

5158

Template Programming

Template programming provides a lsquolanguage within C++rsquocode gets evaluated during compilation

One of the simplest template examples is

template lttypename Tgtconst Tamp min(const Tamp x const Tamp y)

return y lt x y x

This can now be used to compute the minimum between twoint variables or double or in fact any admissible typeproviding an operatorlt() for less-than comparison

5258

Template Programming

Another template example is a class squaring its argument

template lttypename Tgtclass square public stdunary_functionltTTgt public

T operator()(T t) const return tt

which can be used along with STL algorithms

transform(xbegin() xend() square)

5358

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 22: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

Overview Vision

Bell Labs May 1976

2258

Interface Vision

R offers us the best of both worlds

middot Compiled code withmiddot Access to proven libraries and algorithms in CC++Fortranmiddot Extremely high performance (in both serial and parallel modes)

middot Interpreted code withmiddot An accessible high-level language made for Programming with

Datamiddot An interactive workflow for data analysismiddot Support for rapid prototyping research and experimentation

2358

Why Rcpp

middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples

middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure

middot Expressive as it allows for vectorised C++ using Rcpp Sugarmiddot Seamless access to all R objects vector matrix list

S3S4RefClass Environment Function hellipmiddot Speed gains for a variety of tasks Rcpp excels precisely

where R struggles loops function calls hellipmiddot Extensions greatly facilitates access to external libraries

using eg Rcpp modules

2458

Overview Speed

Speed Example 1 (due to Christian Robert)

Five different ways to compute 1(1 + x)

f lt- function(n x=1) for(i in 1n) x lt- 1(1+x)g lt- function(n x=1) for(i in 1n) x lt- (1(1+x))h lt- function(n x=1) for(i in 1n) x lt- (1+x)^(-1)j lt- function(n x=1) for(i in 1n) x lt- 11+xk lt- function(n x=1) for(i in 1n) x lt- 11+xlibrary(rbenchmark)N lt- 1e5benchmark(f(N1)g(N1)h(N1)j(N1)k(N1)order=relative)[14]

2658

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 5 k(N 1) 100 6435 1000 1 f(N 1) 100 6609 1027 2 g(N 1) 100 7757 1205 4 j(N 1) 100 7882 1225 3 h(N 1) 100 11766 1828

2758

Speed Example 1 (due to Christian Robert)

Adding a C++ variant is easy

cppFunction(double m(int n double x)

for (int i=0 iltn i++)x = 1 (1+x)

return x

)

(We will learn more about cppFunction() later)

2858

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 6 m(N 1) 100 0170 1000 1 f(N 1) 100 6854 40318 5 k(N 1) 100 7811 45947 4 j(N 1) 100 9183 54018 2 g(N 1) 100 9489 55818 3 h(N 1) 100 11725 68971

2958

Speed Example 2 (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3058

Speed Example 2 (due to StackOverflow)

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3158

Speed Example 2 (due to StackOverflow)

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0030 1000 2 f(15) 100 0335 11167 3 f(20) 100 3517 117233

3258

Speed Example 2 (due to StackOverflow)

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

deployed as

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

3358

Speed Example 2 (due to StackOverflow)

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(20) g(20) order=relative)[14]

test replications elapsed relative 2 g(20) 100 0011 1000 1 f(20) 100 3776 343273

A nice gain of a few orders of magnitude3458

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

3558

What Next

Hands-on

Programming with C++

middot C++ Basicsmiddot Debuggingmiddot Best Practices

and then on to Rcpp itself

3758

C++ Basics

Compiled not Interpreted

Need to compile and link

include ltcstdiogt

int main(void) printf(Hello worldn)return 0

3958

Compiled not Interpreted

Or streams output rather than printf

include ltiostreamgt

int main(void) stdcout ltlt Hello world ltlt stdendlreturn 0

4058

Compiled not Interpreted

g++ -o will compile and link

We will now look at an examples with explicit linking

4158

Compiled not Interpreted

include ltcstdiogt

define MATHLIB_STANDALONEinclude ltRmathhgt

int main(void) printf(N(01) 95th percentile 98fn

qnorm(095 00 10 1 0))

4258

Compiled not Interpreted

We may need to supply

middot header location via -Imiddot library location via -Lmiddot library via -llibraryname

g++ -Iusrinclude -c qnorm_rmathcppg++ -o qnorm_rmath qnorm_rmatho -Lusrlib -lRmath

4358

Statically Typed

middot R is dynamically typed x lt- 314 x lt- foo is validmiddot In C++ each variable must be declared before first usemiddot Common types are int and long (possibly with unsigned)float and double bool as well as char

middot No standard string type though stdstring is closemiddot All these variables types are scalars which is fundamentally

different from R where everything is a vectormiddot class (and struct) allow creation of composite types

classes add behaviour to data to form objectsmiddot Variables need to be declared cannot change

4458

C++ Basics A Better C

C++ is a Better C

middot control structures similar to what R offers for while ifswitch

middot functions are similar too but note the difference inpositional-only matching also same function name butdifferent arguments allowed in C++

middot pointers and memory management very different but lotsof issues people had with C can be avoided via STL (whichis something Rcpp promotes too)

middot sometimes still useful to know what a pointer is hellip

4658

Object-Oriented

This is a second key feature of C++ and it does it differentlyfrom S3 and S4

struct Date unsigned int yearunsigned int monthunsigned int day

struct Person char firstname[20]char lastname[20]struct Date birthdayunsigned long id

4758

Object-Oriented

Object-orientation in the C++ sense matches data with codeoperating on it

class Date private

unsigned int yearunsigned int monthunsigned int date

publicvoid setDate(int y int m int d)int getDay()int getMonth()int getYear()

4858

Generic Programming and the STL

The STL promotes generic programming

For example the sequence container types vector dequeand list all support

middot push_back() to insert at the endmiddot pop_back() to remove from the frontmiddot begin() returning an iterator to the first elementmiddot end() returning an iterator to just after the last elementmiddot size() for the number of elements

but only list has push_front() and pop_front()

Other useful containers set multiset map and multimap4958

Generic Programming and the STL

Traversal of containers can be achieved via iterators whichrequire suitable member functions begin() and end()

stdvectorltdoublegtconst_iterator sifor (si=sbegin() si = send() si++)

stdcout ltlt si ltlt stdendl

5058

Generic Programming and the STL

Another key STL part are algorithms

double sum = accumulate(sbegin() send() 0)

Some other STL algorithms are

middot find finds the first element equal to the supplied valuemiddot count counts the number of matching elementsmiddot transform applies a supplied function to each elementmiddot for_each sweeps over all elements does not altermiddot inner_product inner product of two vectors

5158

Template Programming

Template programming provides a lsquolanguage within C++rsquocode gets evaluated during compilation

One of the simplest template examples is

template lttypename Tgtconst Tamp min(const Tamp x const Tamp y)

return y lt x y x

This can now be used to compute the minimum between twoint variables or double or in fact any admissible typeproviding an operatorlt() for less-than comparison

5258

Template Programming

Another template example is a class squaring its argument

template lttypename Tgtclass square public stdunary_functionltTTgt public

T operator()(T t) const return tt

which can be used along with STL algorithms

transform(xbegin() xend() square)

5358

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 23: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

Bell Labs May 1976

2258

Interface Vision

R offers us the best of both worlds

middot Compiled code withmiddot Access to proven libraries and algorithms in CC++Fortranmiddot Extremely high performance (in both serial and parallel modes)

middot Interpreted code withmiddot An accessible high-level language made for Programming with

Datamiddot An interactive workflow for data analysismiddot Support for rapid prototyping research and experimentation

2358

Why Rcpp

middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples

middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure

middot Expressive as it allows for vectorised C++ using Rcpp Sugarmiddot Seamless access to all R objects vector matrix list

S3S4RefClass Environment Function hellipmiddot Speed gains for a variety of tasks Rcpp excels precisely

where R struggles loops function calls hellipmiddot Extensions greatly facilitates access to external libraries

using eg Rcpp modules

2458

Overview Speed

Speed Example 1 (due to Christian Robert)

Five different ways to compute 1(1 + x)

f lt- function(n x=1) for(i in 1n) x lt- 1(1+x)g lt- function(n x=1) for(i in 1n) x lt- (1(1+x))h lt- function(n x=1) for(i in 1n) x lt- (1+x)^(-1)j lt- function(n x=1) for(i in 1n) x lt- 11+xk lt- function(n x=1) for(i in 1n) x lt- 11+xlibrary(rbenchmark)N lt- 1e5benchmark(f(N1)g(N1)h(N1)j(N1)k(N1)order=relative)[14]

2658

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 5 k(N 1) 100 6435 1000 1 f(N 1) 100 6609 1027 2 g(N 1) 100 7757 1205 4 j(N 1) 100 7882 1225 3 h(N 1) 100 11766 1828

2758

Speed Example 1 (due to Christian Robert)

Adding a C++ variant is easy

cppFunction(double m(int n double x)

for (int i=0 iltn i++)x = 1 (1+x)

return x

)

(We will learn more about cppFunction() later)

2858

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 6 m(N 1) 100 0170 1000 1 f(N 1) 100 6854 40318 5 k(N 1) 100 7811 45947 4 j(N 1) 100 9183 54018 2 g(N 1) 100 9489 55818 3 h(N 1) 100 11725 68971

2958

Speed Example 2 (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3058

Speed Example 2 (due to StackOverflow)

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3158

Speed Example 2 (due to StackOverflow)

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0030 1000 2 f(15) 100 0335 11167 3 f(20) 100 3517 117233

3258

Speed Example 2 (due to StackOverflow)

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

deployed as

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

3358

Speed Example 2 (due to StackOverflow)

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(20) g(20) order=relative)[14]

test replications elapsed relative 2 g(20) 100 0011 1000 1 f(20) 100 3776 343273

A nice gain of a few orders of magnitude3458

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

3558

What Next

Hands-on

Programming with C++

middot C++ Basicsmiddot Debuggingmiddot Best Practices

and then on to Rcpp itself

3758

C++ Basics

Compiled not Interpreted

Need to compile and link

include ltcstdiogt

int main(void) printf(Hello worldn)return 0

3958

Compiled not Interpreted

Or streams output rather than printf

include ltiostreamgt

int main(void) stdcout ltlt Hello world ltlt stdendlreturn 0

4058

Compiled not Interpreted

g++ -o will compile and link

We will now look at an examples with explicit linking

4158

Compiled not Interpreted

include ltcstdiogt

define MATHLIB_STANDALONEinclude ltRmathhgt

int main(void) printf(N(01) 95th percentile 98fn

qnorm(095 00 10 1 0))

4258

Compiled not Interpreted

We may need to supply

middot header location via -Imiddot library location via -Lmiddot library via -llibraryname

g++ -Iusrinclude -c qnorm_rmathcppg++ -o qnorm_rmath qnorm_rmatho -Lusrlib -lRmath

4358

Statically Typed

middot R is dynamically typed x lt- 314 x lt- foo is validmiddot In C++ each variable must be declared before first usemiddot Common types are int and long (possibly with unsigned)float and double bool as well as char

middot No standard string type though stdstring is closemiddot All these variables types are scalars which is fundamentally

different from R where everything is a vectormiddot class (and struct) allow creation of composite types

classes add behaviour to data to form objectsmiddot Variables need to be declared cannot change

4458

C++ Basics A Better C

C++ is a Better C

middot control structures similar to what R offers for while ifswitch

middot functions are similar too but note the difference inpositional-only matching also same function name butdifferent arguments allowed in C++

middot pointers and memory management very different but lotsof issues people had with C can be avoided via STL (whichis something Rcpp promotes too)

middot sometimes still useful to know what a pointer is hellip

4658

Object-Oriented

This is a second key feature of C++ and it does it differentlyfrom S3 and S4

struct Date unsigned int yearunsigned int monthunsigned int day

struct Person char firstname[20]char lastname[20]struct Date birthdayunsigned long id

4758

Object-Oriented

Object-orientation in the C++ sense matches data with codeoperating on it

class Date private

unsigned int yearunsigned int monthunsigned int date

publicvoid setDate(int y int m int d)int getDay()int getMonth()int getYear()

4858

Generic Programming and the STL

The STL promotes generic programming

For example the sequence container types vector dequeand list all support

middot push_back() to insert at the endmiddot pop_back() to remove from the frontmiddot begin() returning an iterator to the first elementmiddot end() returning an iterator to just after the last elementmiddot size() for the number of elements

but only list has push_front() and pop_front()

Other useful containers set multiset map and multimap4958

Generic Programming and the STL

Traversal of containers can be achieved via iterators whichrequire suitable member functions begin() and end()

stdvectorltdoublegtconst_iterator sifor (si=sbegin() si = send() si++)

stdcout ltlt si ltlt stdendl

5058

Generic Programming and the STL

Another key STL part are algorithms

double sum = accumulate(sbegin() send() 0)

Some other STL algorithms are

middot find finds the first element equal to the supplied valuemiddot count counts the number of matching elementsmiddot transform applies a supplied function to each elementmiddot for_each sweeps over all elements does not altermiddot inner_product inner product of two vectors

5158

Template Programming

Template programming provides a lsquolanguage within C++rsquocode gets evaluated during compilation

One of the simplest template examples is

template lttypename Tgtconst Tamp min(const Tamp x const Tamp y)

return y lt x y x

This can now be used to compute the minimum between twoint variables or double or in fact any admissible typeproviding an operatorlt() for less-than comparison

5258

Template Programming

Another template example is a class squaring its argument

template lttypename Tgtclass square public stdunary_functionltTTgt public

T operator()(T t) const return tt

which can be used along with STL algorithms

transform(xbegin() xend() square)

5358

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 24: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

Interface Vision

R offers us the best of both worlds

middot Compiled code withmiddot Access to proven libraries and algorithms in CC++Fortranmiddot Extremely high performance (in both serial and parallel modes)

middot Interpreted code withmiddot An accessible high-level language made for Programming with

Datamiddot An interactive workflow for data analysismiddot Support for rapid prototyping research and experimentation

2358

Why Rcpp

middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples

middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure

middot Expressive as it allows for vectorised C++ using Rcpp Sugarmiddot Seamless access to all R objects vector matrix list

S3S4RefClass Environment Function hellipmiddot Speed gains for a variety of tasks Rcpp excels precisely

where R struggles loops function calls hellipmiddot Extensions greatly facilitates access to external libraries

using eg Rcpp modules

2458

Overview Speed

Speed Example 1 (due to Christian Robert)

Five different ways to compute 1(1 + x)

f lt- function(n x=1) for(i in 1n) x lt- 1(1+x)g lt- function(n x=1) for(i in 1n) x lt- (1(1+x))h lt- function(n x=1) for(i in 1n) x lt- (1+x)^(-1)j lt- function(n x=1) for(i in 1n) x lt- 11+xk lt- function(n x=1) for(i in 1n) x lt- 11+xlibrary(rbenchmark)N lt- 1e5benchmark(f(N1)g(N1)h(N1)j(N1)k(N1)order=relative)[14]

2658

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 5 k(N 1) 100 6435 1000 1 f(N 1) 100 6609 1027 2 g(N 1) 100 7757 1205 4 j(N 1) 100 7882 1225 3 h(N 1) 100 11766 1828

2758

Speed Example 1 (due to Christian Robert)

Adding a C++ variant is easy

cppFunction(double m(int n double x)

for (int i=0 iltn i++)x = 1 (1+x)

return x

)

(We will learn more about cppFunction() later)

2858

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 6 m(N 1) 100 0170 1000 1 f(N 1) 100 6854 40318 5 k(N 1) 100 7811 45947 4 j(N 1) 100 9183 54018 2 g(N 1) 100 9489 55818 3 h(N 1) 100 11725 68971

2958

Speed Example 2 (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3058

Speed Example 2 (due to StackOverflow)

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3158

Speed Example 2 (due to StackOverflow)

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0030 1000 2 f(15) 100 0335 11167 3 f(20) 100 3517 117233

3258

Speed Example 2 (due to StackOverflow)

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

deployed as

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

3358

Speed Example 2 (due to StackOverflow)

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(20) g(20) order=relative)[14]

test replications elapsed relative 2 g(20) 100 0011 1000 1 f(20) 100 3776 343273

A nice gain of a few orders of magnitude3458

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

3558

What Next

Hands-on

Programming with C++

middot C++ Basicsmiddot Debuggingmiddot Best Practices

and then on to Rcpp itself

3758

C++ Basics

Compiled not Interpreted

Need to compile and link

include ltcstdiogt

int main(void) printf(Hello worldn)return 0

3958

Compiled not Interpreted

Or streams output rather than printf

include ltiostreamgt

int main(void) stdcout ltlt Hello world ltlt stdendlreturn 0

4058

Compiled not Interpreted

g++ -o will compile and link

We will now look at an examples with explicit linking

4158

Compiled not Interpreted

include ltcstdiogt

define MATHLIB_STANDALONEinclude ltRmathhgt

int main(void) printf(N(01) 95th percentile 98fn

qnorm(095 00 10 1 0))

4258

Compiled not Interpreted

We may need to supply

middot header location via -Imiddot library location via -Lmiddot library via -llibraryname

g++ -Iusrinclude -c qnorm_rmathcppg++ -o qnorm_rmath qnorm_rmatho -Lusrlib -lRmath

4358

Statically Typed

middot R is dynamically typed x lt- 314 x lt- foo is validmiddot In C++ each variable must be declared before first usemiddot Common types are int and long (possibly with unsigned)float and double bool as well as char

middot No standard string type though stdstring is closemiddot All these variables types are scalars which is fundamentally

different from R where everything is a vectormiddot class (and struct) allow creation of composite types

classes add behaviour to data to form objectsmiddot Variables need to be declared cannot change

4458

C++ Basics A Better C

C++ is a Better C

middot control structures similar to what R offers for while ifswitch

middot functions are similar too but note the difference inpositional-only matching also same function name butdifferent arguments allowed in C++

middot pointers and memory management very different but lotsof issues people had with C can be avoided via STL (whichis something Rcpp promotes too)

middot sometimes still useful to know what a pointer is hellip

4658

Object-Oriented

This is a second key feature of C++ and it does it differentlyfrom S3 and S4

struct Date unsigned int yearunsigned int monthunsigned int day

struct Person char firstname[20]char lastname[20]struct Date birthdayunsigned long id

4758

Object-Oriented

Object-orientation in the C++ sense matches data with codeoperating on it

class Date private

unsigned int yearunsigned int monthunsigned int date

publicvoid setDate(int y int m int d)int getDay()int getMonth()int getYear()

4858

Generic Programming and the STL

The STL promotes generic programming

For example the sequence container types vector dequeand list all support

middot push_back() to insert at the endmiddot pop_back() to remove from the frontmiddot begin() returning an iterator to the first elementmiddot end() returning an iterator to just after the last elementmiddot size() for the number of elements

but only list has push_front() and pop_front()

Other useful containers set multiset map and multimap4958

Generic Programming and the STL

Traversal of containers can be achieved via iterators whichrequire suitable member functions begin() and end()

stdvectorltdoublegtconst_iterator sifor (si=sbegin() si = send() si++)

stdcout ltlt si ltlt stdendl

5058

Generic Programming and the STL

Another key STL part are algorithms

double sum = accumulate(sbegin() send() 0)

Some other STL algorithms are

middot find finds the first element equal to the supplied valuemiddot count counts the number of matching elementsmiddot transform applies a supplied function to each elementmiddot for_each sweeps over all elements does not altermiddot inner_product inner product of two vectors

5158

Template Programming

Template programming provides a lsquolanguage within C++rsquocode gets evaluated during compilation

One of the simplest template examples is

template lttypename Tgtconst Tamp min(const Tamp x const Tamp y)

return y lt x y x

This can now be used to compute the minimum between twoint variables or double or in fact any admissible typeproviding an operatorlt() for less-than comparison

5258

Template Programming

Another template example is a class squaring its argument

template lttypename Tgtclass square public stdunary_functionltTTgt public

T operator()(T t) const return tt

which can be used along with STL algorithms

transform(xbegin() xend() square)

5358

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 25: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

Why Rcpp

middot Easy to learn as it really does not have to be thatcomplicated ndash we will see numerous few examples

middot Easy to use as it avoids build and OS system complexitiesthanks to the R infrastrucure

middot Expressive as it allows for vectorised C++ using Rcpp Sugarmiddot Seamless access to all R objects vector matrix list

S3S4RefClass Environment Function hellipmiddot Speed gains for a variety of tasks Rcpp excels precisely

where R struggles loops function calls hellipmiddot Extensions greatly facilitates access to external libraries

using eg Rcpp modules

2458

Overview Speed

Speed Example 1 (due to Christian Robert)

Five different ways to compute 1(1 + x)

f lt- function(n x=1) for(i in 1n) x lt- 1(1+x)g lt- function(n x=1) for(i in 1n) x lt- (1(1+x))h lt- function(n x=1) for(i in 1n) x lt- (1+x)^(-1)j lt- function(n x=1) for(i in 1n) x lt- 11+xk lt- function(n x=1) for(i in 1n) x lt- 11+xlibrary(rbenchmark)N lt- 1e5benchmark(f(N1)g(N1)h(N1)j(N1)k(N1)order=relative)[14]

2658

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 5 k(N 1) 100 6435 1000 1 f(N 1) 100 6609 1027 2 g(N 1) 100 7757 1205 4 j(N 1) 100 7882 1225 3 h(N 1) 100 11766 1828

2758

Speed Example 1 (due to Christian Robert)

Adding a C++ variant is easy

cppFunction(double m(int n double x)

for (int i=0 iltn i++)x = 1 (1+x)

return x

)

(We will learn more about cppFunction() later)

2858

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 6 m(N 1) 100 0170 1000 1 f(N 1) 100 6854 40318 5 k(N 1) 100 7811 45947 4 j(N 1) 100 9183 54018 2 g(N 1) 100 9489 55818 3 h(N 1) 100 11725 68971

2958

Speed Example 2 (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3058

Speed Example 2 (due to StackOverflow)

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3158

Speed Example 2 (due to StackOverflow)

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0030 1000 2 f(15) 100 0335 11167 3 f(20) 100 3517 117233

3258

Speed Example 2 (due to StackOverflow)

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

deployed as

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

3358

Speed Example 2 (due to StackOverflow)

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(20) g(20) order=relative)[14]

test replications elapsed relative 2 g(20) 100 0011 1000 1 f(20) 100 3776 343273

A nice gain of a few orders of magnitude3458

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

3558

What Next

Hands-on

Programming with C++

middot C++ Basicsmiddot Debuggingmiddot Best Practices

and then on to Rcpp itself

3758

C++ Basics

Compiled not Interpreted

Need to compile and link

include ltcstdiogt

int main(void) printf(Hello worldn)return 0

3958

Compiled not Interpreted

Or streams output rather than printf

include ltiostreamgt

int main(void) stdcout ltlt Hello world ltlt stdendlreturn 0

4058

Compiled not Interpreted

g++ -o will compile and link

We will now look at an examples with explicit linking

4158

Compiled not Interpreted

include ltcstdiogt

define MATHLIB_STANDALONEinclude ltRmathhgt

int main(void) printf(N(01) 95th percentile 98fn

qnorm(095 00 10 1 0))

4258

Compiled not Interpreted

We may need to supply

middot header location via -Imiddot library location via -Lmiddot library via -llibraryname

g++ -Iusrinclude -c qnorm_rmathcppg++ -o qnorm_rmath qnorm_rmatho -Lusrlib -lRmath

4358

Statically Typed

middot R is dynamically typed x lt- 314 x lt- foo is validmiddot In C++ each variable must be declared before first usemiddot Common types are int and long (possibly with unsigned)float and double bool as well as char

middot No standard string type though stdstring is closemiddot All these variables types are scalars which is fundamentally

different from R where everything is a vectormiddot class (and struct) allow creation of composite types

classes add behaviour to data to form objectsmiddot Variables need to be declared cannot change

4458

C++ Basics A Better C

C++ is a Better C

middot control structures similar to what R offers for while ifswitch

middot functions are similar too but note the difference inpositional-only matching also same function name butdifferent arguments allowed in C++

middot pointers and memory management very different but lotsof issues people had with C can be avoided via STL (whichis something Rcpp promotes too)

middot sometimes still useful to know what a pointer is hellip

4658

Object-Oriented

This is a second key feature of C++ and it does it differentlyfrom S3 and S4

struct Date unsigned int yearunsigned int monthunsigned int day

struct Person char firstname[20]char lastname[20]struct Date birthdayunsigned long id

4758

Object-Oriented

Object-orientation in the C++ sense matches data with codeoperating on it

class Date private

unsigned int yearunsigned int monthunsigned int date

publicvoid setDate(int y int m int d)int getDay()int getMonth()int getYear()

4858

Generic Programming and the STL

The STL promotes generic programming

For example the sequence container types vector dequeand list all support

middot push_back() to insert at the endmiddot pop_back() to remove from the frontmiddot begin() returning an iterator to the first elementmiddot end() returning an iterator to just after the last elementmiddot size() for the number of elements

but only list has push_front() and pop_front()

Other useful containers set multiset map and multimap4958

Generic Programming and the STL

Traversal of containers can be achieved via iterators whichrequire suitable member functions begin() and end()

stdvectorltdoublegtconst_iterator sifor (si=sbegin() si = send() si++)

stdcout ltlt si ltlt stdendl

5058

Generic Programming and the STL

Another key STL part are algorithms

double sum = accumulate(sbegin() send() 0)

Some other STL algorithms are

middot find finds the first element equal to the supplied valuemiddot count counts the number of matching elementsmiddot transform applies a supplied function to each elementmiddot for_each sweeps over all elements does not altermiddot inner_product inner product of two vectors

5158

Template Programming

Template programming provides a lsquolanguage within C++rsquocode gets evaluated during compilation

One of the simplest template examples is

template lttypename Tgtconst Tamp min(const Tamp x const Tamp y)

return y lt x y x

This can now be used to compute the minimum between twoint variables or double or in fact any admissible typeproviding an operatorlt() for less-than comparison

5258

Template Programming

Another template example is a class squaring its argument

template lttypename Tgtclass square public stdunary_functionltTTgt public

T operator()(T t) const return tt

which can be used along with STL algorithms

transform(xbegin() xend() square)

5358

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 26: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

Overview Speed

Speed Example 1 (due to Christian Robert)

Five different ways to compute 1(1 + x)

f lt- function(n x=1) for(i in 1n) x lt- 1(1+x)g lt- function(n x=1) for(i in 1n) x lt- (1(1+x))h lt- function(n x=1) for(i in 1n) x lt- (1+x)^(-1)j lt- function(n x=1) for(i in 1n) x lt- 11+xk lt- function(n x=1) for(i in 1n) x lt- 11+xlibrary(rbenchmark)N lt- 1e5benchmark(f(N1)g(N1)h(N1)j(N1)k(N1)order=relative)[14]

2658

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 5 k(N 1) 100 6435 1000 1 f(N 1) 100 6609 1027 2 g(N 1) 100 7757 1205 4 j(N 1) 100 7882 1225 3 h(N 1) 100 11766 1828

2758

Speed Example 1 (due to Christian Robert)

Adding a C++ variant is easy

cppFunction(double m(int n double x)

for (int i=0 iltn i++)x = 1 (1+x)

return x

)

(We will learn more about cppFunction() later)

2858

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 6 m(N 1) 100 0170 1000 1 f(N 1) 100 6854 40318 5 k(N 1) 100 7811 45947 4 j(N 1) 100 9183 54018 2 g(N 1) 100 9489 55818 3 h(N 1) 100 11725 68971

2958

Speed Example 2 (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3058

Speed Example 2 (due to StackOverflow)

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3158

Speed Example 2 (due to StackOverflow)

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0030 1000 2 f(15) 100 0335 11167 3 f(20) 100 3517 117233

3258

Speed Example 2 (due to StackOverflow)

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

deployed as

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

3358

Speed Example 2 (due to StackOverflow)

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(20) g(20) order=relative)[14]

test replications elapsed relative 2 g(20) 100 0011 1000 1 f(20) 100 3776 343273

A nice gain of a few orders of magnitude3458

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

3558

What Next

Hands-on

Programming with C++

middot C++ Basicsmiddot Debuggingmiddot Best Practices

and then on to Rcpp itself

3758

C++ Basics

Compiled not Interpreted

Need to compile and link

include ltcstdiogt

int main(void) printf(Hello worldn)return 0

3958

Compiled not Interpreted

Or streams output rather than printf

include ltiostreamgt

int main(void) stdcout ltlt Hello world ltlt stdendlreturn 0

4058

Compiled not Interpreted

g++ -o will compile and link

We will now look at an examples with explicit linking

4158

Compiled not Interpreted

include ltcstdiogt

define MATHLIB_STANDALONEinclude ltRmathhgt

int main(void) printf(N(01) 95th percentile 98fn

qnorm(095 00 10 1 0))

4258

Compiled not Interpreted

We may need to supply

middot header location via -Imiddot library location via -Lmiddot library via -llibraryname

g++ -Iusrinclude -c qnorm_rmathcppg++ -o qnorm_rmath qnorm_rmatho -Lusrlib -lRmath

4358

Statically Typed

middot R is dynamically typed x lt- 314 x lt- foo is validmiddot In C++ each variable must be declared before first usemiddot Common types are int and long (possibly with unsigned)float and double bool as well as char

middot No standard string type though stdstring is closemiddot All these variables types are scalars which is fundamentally

different from R where everything is a vectormiddot class (and struct) allow creation of composite types

classes add behaviour to data to form objectsmiddot Variables need to be declared cannot change

4458

C++ Basics A Better C

C++ is a Better C

middot control structures similar to what R offers for while ifswitch

middot functions are similar too but note the difference inpositional-only matching also same function name butdifferent arguments allowed in C++

middot pointers and memory management very different but lotsof issues people had with C can be avoided via STL (whichis something Rcpp promotes too)

middot sometimes still useful to know what a pointer is hellip

4658

Object-Oriented

This is a second key feature of C++ and it does it differentlyfrom S3 and S4

struct Date unsigned int yearunsigned int monthunsigned int day

struct Person char firstname[20]char lastname[20]struct Date birthdayunsigned long id

4758

Object-Oriented

Object-orientation in the C++ sense matches data with codeoperating on it

class Date private

unsigned int yearunsigned int monthunsigned int date

publicvoid setDate(int y int m int d)int getDay()int getMonth()int getYear()

4858

Generic Programming and the STL

The STL promotes generic programming

For example the sequence container types vector dequeand list all support

middot push_back() to insert at the endmiddot pop_back() to remove from the frontmiddot begin() returning an iterator to the first elementmiddot end() returning an iterator to just after the last elementmiddot size() for the number of elements

but only list has push_front() and pop_front()

Other useful containers set multiset map and multimap4958

Generic Programming and the STL

Traversal of containers can be achieved via iterators whichrequire suitable member functions begin() and end()

stdvectorltdoublegtconst_iterator sifor (si=sbegin() si = send() si++)

stdcout ltlt si ltlt stdendl

5058

Generic Programming and the STL

Another key STL part are algorithms

double sum = accumulate(sbegin() send() 0)

Some other STL algorithms are

middot find finds the first element equal to the supplied valuemiddot count counts the number of matching elementsmiddot transform applies a supplied function to each elementmiddot for_each sweeps over all elements does not altermiddot inner_product inner product of two vectors

5158

Template Programming

Template programming provides a lsquolanguage within C++rsquocode gets evaluated during compilation

One of the simplest template examples is

template lttypename Tgtconst Tamp min(const Tamp x const Tamp y)

return y lt x y x

This can now be used to compute the minimum between twoint variables or double or in fact any admissible typeproviding an operatorlt() for less-than comparison

5258

Template Programming

Another template example is a class squaring its argument

template lttypename Tgtclass square public stdunary_functionltTTgt public

T operator()(T t) const return tt

which can be used along with STL algorithms

transform(xbegin() xend() square)

5358

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 27: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

Speed Example 1 (due to Christian Robert)

Five different ways to compute 1(1 + x)

f lt- function(n x=1) for(i in 1n) x lt- 1(1+x)g lt- function(n x=1) for(i in 1n) x lt- (1(1+x))h lt- function(n x=1) for(i in 1n) x lt- (1+x)^(-1)j lt- function(n x=1) for(i in 1n) x lt- 11+xk lt- function(n x=1) for(i in 1n) x lt- 11+xlibrary(rbenchmark)N lt- 1e5benchmark(f(N1)g(N1)h(N1)j(N1)k(N1)order=relative)[14]

2658

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 5 k(N 1) 100 6435 1000 1 f(N 1) 100 6609 1027 2 g(N 1) 100 7757 1205 4 j(N 1) 100 7882 1225 3 h(N 1) 100 11766 1828

2758

Speed Example 1 (due to Christian Robert)

Adding a C++ variant is easy

cppFunction(double m(int n double x)

for (int i=0 iltn i++)x = 1 (1+x)

return x

)

(We will learn more about cppFunction() later)

2858

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 6 m(N 1) 100 0170 1000 1 f(N 1) 100 6854 40318 5 k(N 1) 100 7811 45947 4 j(N 1) 100 9183 54018 2 g(N 1) 100 9489 55818 3 h(N 1) 100 11725 68971

2958

Speed Example 2 (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3058

Speed Example 2 (due to StackOverflow)

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3158

Speed Example 2 (due to StackOverflow)

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0030 1000 2 f(15) 100 0335 11167 3 f(20) 100 3517 117233

3258

Speed Example 2 (due to StackOverflow)

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

deployed as

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

3358

Speed Example 2 (due to StackOverflow)

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(20) g(20) order=relative)[14]

test replications elapsed relative 2 g(20) 100 0011 1000 1 f(20) 100 3776 343273

A nice gain of a few orders of magnitude3458

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

3558

What Next

Hands-on

Programming with C++

middot C++ Basicsmiddot Debuggingmiddot Best Practices

and then on to Rcpp itself

3758

C++ Basics

Compiled not Interpreted

Need to compile and link

include ltcstdiogt

int main(void) printf(Hello worldn)return 0

3958

Compiled not Interpreted

Or streams output rather than printf

include ltiostreamgt

int main(void) stdcout ltlt Hello world ltlt stdendlreturn 0

4058

Compiled not Interpreted

g++ -o will compile and link

We will now look at an examples with explicit linking

4158

Compiled not Interpreted

include ltcstdiogt

define MATHLIB_STANDALONEinclude ltRmathhgt

int main(void) printf(N(01) 95th percentile 98fn

qnorm(095 00 10 1 0))

4258

Compiled not Interpreted

We may need to supply

middot header location via -Imiddot library location via -Lmiddot library via -llibraryname

g++ -Iusrinclude -c qnorm_rmathcppg++ -o qnorm_rmath qnorm_rmatho -Lusrlib -lRmath

4358

Statically Typed

middot R is dynamically typed x lt- 314 x lt- foo is validmiddot In C++ each variable must be declared before first usemiddot Common types are int and long (possibly with unsigned)float and double bool as well as char

middot No standard string type though stdstring is closemiddot All these variables types are scalars which is fundamentally

different from R where everything is a vectormiddot class (and struct) allow creation of composite types

classes add behaviour to data to form objectsmiddot Variables need to be declared cannot change

4458

C++ Basics A Better C

C++ is a Better C

middot control structures similar to what R offers for while ifswitch

middot functions are similar too but note the difference inpositional-only matching also same function name butdifferent arguments allowed in C++

middot pointers and memory management very different but lotsof issues people had with C can be avoided via STL (whichis something Rcpp promotes too)

middot sometimes still useful to know what a pointer is hellip

4658

Object-Oriented

This is a second key feature of C++ and it does it differentlyfrom S3 and S4

struct Date unsigned int yearunsigned int monthunsigned int day

struct Person char firstname[20]char lastname[20]struct Date birthdayunsigned long id

4758

Object-Oriented

Object-orientation in the C++ sense matches data with codeoperating on it

class Date private

unsigned int yearunsigned int monthunsigned int date

publicvoid setDate(int y int m int d)int getDay()int getMonth()int getYear()

4858

Generic Programming and the STL

The STL promotes generic programming

For example the sequence container types vector dequeand list all support

middot push_back() to insert at the endmiddot pop_back() to remove from the frontmiddot begin() returning an iterator to the first elementmiddot end() returning an iterator to just after the last elementmiddot size() for the number of elements

but only list has push_front() and pop_front()

Other useful containers set multiset map and multimap4958

Generic Programming and the STL

Traversal of containers can be achieved via iterators whichrequire suitable member functions begin() and end()

stdvectorltdoublegtconst_iterator sifor (si=sbegin() si = send() si++)

stdcout ltlt si ltlt stdendl

5058

Generic Programming and the STL

Another key STL part are algorithms

double sum = accumulate(sbegin() send() 0)

Some other STL algorithms are

middot find finds the first element equal to the supplied valuemiddot count counts the number of matching elementsmiddot transform applies a supplied function to each elementmiddot for_each sweeps over all elements does not altermiddot inner_product inner product of two vectors

5158

Template Programming

Template programming provides a lsquolanguage within C++rsquocode gets evaluated during compilation

One of the simplest template examples is

template lttypename Tgtconst Tamp min(const Tamp x const Tamp y)

return y lt x y x

This can now be used to compute the minimum between twoint variables or double or in fact any admissible typeproviding an operatorlt() for less-than comparison

5258

Template Programming

Another template example is a class squaring its argument

template lttypename Tgtclass square public stdunary_functionltTTgt public

T operator()(T t) const return tt

which can be used along with STL algorithms

transform(xbegin() xend() square)

5358

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 28: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 5 k(N 1) 100 6435 1000 1 f(N 1) 100 6609 1027 2 g(N 1) 100 7757 1205 4 j(N 1) 100 7882 1225 3 h(N 1) 100 11766 1828

2758

Speed Example 1 (due to Christian Robert)

Adding a C++ variant is easy

cppFunction(double m(int n double x)

for (int i=0 iltn i++)x = 1 (1+x)

return x

)

(We will learn more about cppFunction() later)

2858

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 6 m(N 1) 100 0170 1000 1 f(N 1) 100 6854 40318 5 k(N 1) 100 7811 45947 4 j(N 1) 100 9183 54018 2 g(N 1) 100 9489 55818 3 h(N 1) 100 11725 68971

2958

Speed Example 2 (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3058

Speed Example 2 (due to StackOverflow)

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3158

Speed Example 2 (due to StackOverflow)

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0030 1000 2 f(15) 100 0335 11167 3 f(20) 100 3517 117233

3258

Speed Example 2 (due to StackOverflow)

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

deployed as

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

3358

Speed Example 2 (due to StackOverflow)

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(20) g(20) order=relative)[14]

test replications elapsed relative 2 g(20) 100 0011 1000 1 f(20) 100 3776 343273

A nice gain of a few orders of magnitude3458

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

3558

What Next

Hands-on

Programming with C++

middot C++ Basicsmiddot Debuggingmiddot Best Practices

and then on to Rcpp itself

3758

C++ Basics

Compiled not Interpreted

Need to compile and link

include ltcstdiogt

int main(void) printf(Hello worldn)return 0

3958

Compiled not Interpreted

Or streams output rather than printf

include ltiostreamgt

int main(void) stdcout ltlt Hello world ltlt stdendlreturn 0

4058

Compiled not Interpreted

g++ -o will compile and link

We will now look at an examples with explicit linking

4158

Compiled not Interpreted

include ltcstdiogt

define MATHLIB_STANDALONEinclude ltRmathhgt

int main(void) printf(N(01) 95th percentile 98fn

qnorm(095 00 10 1 0))

4258

Compiled not Interpreted

We may need to supply

middot header location via -Imiddot library location via -Lmiddot library via -llibraryname

g++ -Iusrinclude -c qnorm_rmathcppg++ -o qnorm_rmath qnorm_rmatho -Lusrlib -lRmath

4358

Statically Typed

middot R is dynamically typed x lt- 314 x lt- foo is validmiddot In C++ each variable must be declared before first usemiddot Common types are int and long (possibly with unsigned)float and double bool as well as char

middot No standard string type though stdstring is closemiddot All these variables types are scalars which is fundamentally

different from R where everything is a vectormiddot class (and struct) allow creation of composite types

classes add behaviour to data to form objectsmiddot Variables need to be declared cannot change

4458

C++ Basics A Better C

C++ is a Better C

middot control structures similar to what R offers for while ifswitch

middot functions are similar too but note the difference inpositional-only matching also same function name butdifferent arguments allowed in C++

middot pointers and memory management very different but lotsof issues people had with C can be avoided via STL (whichis something Rcpp promotes too)

middot sometimes still useful to know what a pointer is hellip

4658

Object-Oriented

This is a second key feature of C++ and it does it differentlyfrom S3 and S4

struct Date unsigned int yearunsigned int monthunsigned int day

struct Person char firstname[20]char lastname[20]struct Date birthdayunsigned long id

4758

Object-Oriented

Object-orientation in the C++ sense matches data with codeoperating on it

class Date private

unsigned int yearunsigned int monthunsigned int date

publicvoid setDate(int y int m int d)int getDay()int getMonth()int getYear()

4858

Generic Programming and the STL

The STL promotes generic programming

For example the sequence container types vector dequeand list all support

middot push_back() to insert at the endmiddot pop_back() to remove from the frontmiddot begin() returning an iterator to the first elementmiddot end() returning an iterator to just after the last elementmiddot size() for the number of elements

but only list has push_front() and pop_front()

Other useful containers set multiset map and multimap4958

Generic Programming and the STL

Traversal of containers can be achieved via iterators whichrequire suitable member functions begin() and end()

stdvectorltdoublegtconst_iterator sifor (si=sbegin() si = send() si++)

stdcout ltlt si ltlt stdendl

5058

Generic Programming and the STL

Another key STL part are algorithms

double sum = accumulate(sbegin() send() 0)

Some other STL algorithms are

middot find finds the first element equal to the supplied valuemiddot count counts the number of matching elementsmiddot transform applies a supplied function to each elementmiddot for_each sweeps over all elements does not altermiddot inner_product inner product of two vectors

5158

Template Programming

Template programming provides a lsquolanguage within C++rsquocode gets evaluated during compilation

One of the simplest template examples is

template lttypename Tgtconst Tamp min(const Tamp x const Tamp y)

return y lt x y x

This can now be used to compute the minimum between twoint variables or double or in fact any admissible typeproviding an operatorlt() for less-than comparison

5258

Template Programming

Another template example is a class squaring its argument

template lttypename Tgtclass square public stdunary_functionltTTgt public

T operator()(T t) const return tt

which can be used along with STL algorithms

transform(xbegin() xend() square)

5358

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 29: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

Speed Example 1 (due to Christian Robert)

Adding a C++ variant is easy

cppFunction(double m(int n double x)

for (int i=0 iltn i++)x = 1 (1+x)

return x

)

(We will learn more about cppFunction() later)

2858

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 6 m(N 1) 100 0170 1000 1 f(N 1) 100 6854 40318 5 k(N 1) 100 7811 45947 4 j(N 1) 100 9183 54018 2 g(N 1) 100 9489 55818 3 h(N 1) 100 11725 68971

2958

Speed Example 2 (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3058

Speed Example 2 (due to StackOverflow)

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3158

Speed Example 2 (due to StackOverflow)

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0030 1000 2 f(15) 100 0335 11167 3 f(20) 100 3517 117233

3258

Speed Example 2 (due to StackOverflow)

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

deployed as

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

3358

Speed Example 2 (due to StackOverflow)

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(20) g(20) order=relative)[14]

test replications elapsed relative 2 g(20) 100 0011 1000 1 f(20) 100 3776 343273

A nice gain of a few orders of magnitude3458

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

3558

What Next

Hands-on

Programming with C++

middot C++ Basicsmiddot Debuggingmiddot Best Practices

and then on to Rcpp itself

3758

C++ Basics

Compiled not Interpreted

Need to compile and link

include ltcstdiogt

int main(void) printf(Hello worldn)return 0

3958

Compiled not Interpreted

Or streams output rather than printf

include ltiostreamgt

int main(void) stdcout ltlt Hello world ltlt stdendlreturn 0

4058

Compiled not Interpreted

g++ -o will compile and link

We will now look at an examples with explicit linking

4158

Compiled not Interpreted

include ltcstdiogt

define MATHLIB_STANDALONEinclude ltRmathhgt

int main(void) printf(N(01) 95th percentile 98fn

qnorm(095 00 10 1 0))

4258

Compiled not Interpreted

We may need to supply

middot header location via -Imiddot library location via -Lmiddot library via -llibraryname

g++ -Iusrinclude -c qnorm_rmathcppg++ -o qnorm_rmath qnorm_rmatho -Lusrlib -lRmath

4358

Statically Typed

middot R is dynamically typed x lt- 314 x lt- foo is validmiddot In C++ each variable must be declared before first usemiddot Common types are int and long (possibly with unsigned)float and double bool as well as char

middot No standard string type though stdstring is closemiddot All these variables types are scalars which is fundamentally

different from R where everything is a vectormiddot class (and struct) allow creation of composite types

classes add behaviour to data to form objectsmiddot Variables need to be declared cannot change

4458

C++ Basics A Better C

C++ is a Better C

middot control structures similar to what R offers for while ifswitch

middot functions are similar too but note the difference inpositional-only matching also same function name butdifferent arguments allowed in C++

middot pointers and memory management very different but lotsof issues people had with C can be avoided via STL (whichis something Rcpp promotes too)

middot sometimes still useful to know what a pointer is hellip

4658

Object-Oriented

This is a second key feature of C++ and it does it differentlyfrom S3 and S4

struct Date unsigned int yearunsigned int monthunsigned int day

struct Person char firstname[20]char lastname[20]struct Date birthdayunsigned long id

4758

Object-Oriented

Object-orientation in the C++ sense matches data with codeoperating on it

class Date private

unsigned int yearunsigned int monthunsigned int date

publicvoid setDate(int y int m int d)int getDay()int getMonth()int getYear()

4858

Generic Programming and the STL

The STL promotes generic programming

For example the sequence container types vector dequeand list all support

middot push_back() to insert at the endmiddot pop_back() to remove from the frontmiddot begin() returning an iterator to the first elementmiddot end() returning an iterator to just after the last elementmiddot size() for the number of elements

but only list has push_front() and pop_front()

Other useful containers set multiset map and multimap4958

Generic Programming and the STL

Traversal of containers can be achieved via iterators whichrequire suitable member functions begin() and end()

stdvectorltdoublegtconst_iterator sifor (si=sbegin() si = send() si++)

stdcout ltlt si ltlt stdendl

5058

Generic Programming and the STL

Another key STL part are algorithms

double sum = accumulate(sbegin() send() 0)

Some other STL algorithms are

middot find finds the first element equal to the supplied valuemiddot count counts the number of matching elementsmiddot transform applies a supplied function to each elementmiddot for_each sweeps over all elements does not altermiddot inner_product inner product of two vectors

5158

Template Programming

Template programming provides a lsquolanguage within C++rsquocode gets evaluated during compilation

One of the simplest template examples is

template lttypename Tgtconst Tamp min(const Tamp x const Tamp y)

return y lt x y x

This can now be used to compute the minimum between twoint variables or double or in fact any admissible typeproviding an operatorlt() for less-than comparison

5258

Template Programming

Another template example is a class squaring its argument

template lttypename Tgtclass square public stdunary_functionltTTgt public

T operator()(T t) const return tt

which can be used along with STL algorithms

transform(xbegin() xend() square)

5358

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 30: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

Speed Example 1 (due to Christian Robert)

test replications elapsed relative 6 m(N 1) 100 0170 1000 1 f(N 1) 100 6854 40318 5 k(N 1) 100 7811 45947 4 j(N 1) 100 9183 54018 2 g(N 1) 100 9489 55818 3 h(N 1) 100 11725 68971

2958

Speed Example 2 (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3058

Speed Example 2 (due to StackOverflow)

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3158

Speed Example 2 (due to StackOverflow)

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0030 1000 2 f(15) 100 0335 11167 3 f(20) 100 3517 117233

3258

Speed Example 2 (due to StackOverflow)

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

deployed as

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

3358

Speed Example 2 (due to StackOverflow)

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(20) g(20) order=relative)[14]

test replications elapsed relative 2 g(20) 100 0011 1000 1 f(20) 100 3776 343273

A nice gain of a few orders of magnitude3458

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

3558

What Next

Hands-on

Programming with C++

middot C++ Basicsmiddot Debuggingmiddot Best Practices

and then on to Rcpp itself

3758

C++ Basics

Compiled not Interpreted

Need to compile and link

include ltcstdiogt

int main(void) printf(Hello worldn)return 0

3958

Compiled not Interpreted

Or streams output rather than printf

include ltiostreamgt

int main(void) stdcout ltlt Hello world ltlt stdendlreturn 0

4058

Compiled not Interpreted

g++ -o will compile and link

We will now look at an examples with explicit linking

4158

Compiled not Interpreted

include ltcstdiogt

define MATHLIB_STANDALONEinclude ltRmathhgt

int main(void) printf(N(01) 95th percentile 98fn

qnorm(095 00 10 1 0))

4258

Compiled not Interpreted

We may need to supply

middot header location via -Imiddot library location via -Lmiddot library via -llibraryname

g++ -Iusrinclude -c qnorm_rmathcppg++ -o qnorm_rmath qnorm_rmatho -Lusrlib -lRmath

4358

Statically Typed

middot R is dynamically typed x lt- 314 x lt- foo is validmiddot In C++ each variable must be declared before first usemiddot Common types are int and long (possibly with unsigned)float and double bool as well as char

middot No standard string type though stdstring is closemiddot All these variables types are scalars which is fundamentally

different from R where everything is a vectormiddot class (and struct) allow creation of composite types

classes add behaviour to data to form objectsmiddot Variables need to be declared cannot change

4458

C++ Basics A Better C

C++ is a Better C

middot control structures similar to what R offers for while ifswitch

middot functions are similar too but note the difference inpositional-only matching also same function name butdifferent arguments allowed in C++

middot pointers and memory management very different but lotsof issues people had with C can be avoided via STL (whichis something Rcpp promotes too)

middot sometimes still useful to know what a pointer is hellip

4658

Object-Oriented

This is a second key feature of C++ and it does it differentlyfrom S3 and S4

struct Date unsigned int yearunsigned int monthunsigned int day

struct Person char firstname[20]char lastname[20]struct Date birthdayunsigned long id

4758

Object-Oriented

Object-orientation in the C++ sense matches data with codeoperating on it

class Date private

unsigned int yearunsigned int monthunsigned int date

publicvoid setDate(int y int m int d)int getDay()int getMonth()int getYear()

4858

Generic Programming and the STL

The STL promotes generic programming

For example the sequence container types vector dequeand list all support

middot push_back() to insert at the endmiddot pop_back() to remove from the frontmiddot begin() returning an iterator to the first elementmiddot end() returning an iterator to just after the last elementmiddot size() for the number of elements

but only list has push_front() and pop_front()

Other useful containers set multiset map and multimap4958

Generic Programming and the STL

Traversal of containers can be achieved via iterators whichrequire suitable member functions begin() and end()

stdvectorltdoublegtconst_iterator sifor (si=sbegin() si = send() si++)

stdcout ltlt si ltlt stdendl

5058

Generic Programming and the STL

Another key STL part are algorithms

double sum = accumulate(sbegin() send() 0)

Some other STL algorithms are

middot find finds the first element equal to the supplied valuemiddot count counts the number of matching elementsmiddot transform applies a supplied function to each elementmiddot for_each sweeps over all elements does not altermiddot inner_product inner product of two vectors

5158

Template Programming

Template programming provides a lsquolanguage within C++rsquocode gets evaluated during compilation

One of the simplest template examples is

template lttypename Tgtconst Tamp min(const Tamp x const Tamp y)

return y lt x y x

This can now be used to compute the minimum between twoint variables or double or in fact any admissible typeproviding an operatorlt() for less-than comparison

5258

Template Programming

Another template example is a class squaring its argument

template lttypename Tgtclass square public stdunary_functionltTTgt public

T operator()(T t) const return tt

which can be used along with STL algorithms

transform(xbegin() xend() square)

5358

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 31: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

Speed Example 2 (due to StackOverflow)

Consider a function defined as

f(n) such that n when n lt 2

f(n minus 1) + f(n minus 2) when n ge 2

3058

Speed Example 2 (due to StackOverflow)

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3158

Speed Example 2 (due to StackOverflow)

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0030 1000 2 f(15) 100 0335 11167 3 f(20) 100 3517 117233

3258

Speed Example 2 (due to StackOverflow)

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

deployed as

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

3358

Speed Example 2 (due to StackOverflow)

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(20) g(20) order=relative)[14]

test replications elapsed relative 2 g(20) 100 0011 1000 1 f(20) 100 3776 343273

A nice gain of a few orders of magnitude3458

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

3558

What Next

Hands-on

Programming with C++

middot C++ Basicsmiddot Debuggingmiddot Best Practices

and then on to Rcpp itself

3758

C++ Basics

Compiled not Interpreted

Need to compile and link

include ltcstdiogt

int main(void) printf(Hello worldn)return 0

3958

Compiled not Interpreted

Or streams output rather than printf

include ltiostreamgt

int main(void) stdcout ltlt Hello world ltlt stdendlreturn 0

4058

Compiled not Interpreted

g++ -o will compile and link

We will now look at an examples with explicit linking

4158

Compiled not Interpreted

include ltcstdiogt

define MATHLIB_STANDALONEinclude ltRmathhgt

int main(void) printf(N(01) 95th percentile 98fn

qnorm(095 00 10 1 0))

4258

Compiled not Interpreted

We may need to supply

middot header location via -Imiddot library location via -Lmiddot library via -llibraryname

g++ -Iusrinclude -c qnorm_rmathcppg++ -o qnorm_rmath qnorm_rmatho -Lusrlib -lRmath

4358

Statically Typed

middot R is dynamically typed x lt- 314 x lt- foo is validmiddot In C++ each variable must be declared before first usemiddot Common types are int and long (possibly with unsigned)float and double bool as well as char

middot No standard string type though stdstring is closemiddot All these variables types are scalars which is fundamentally

different from R where everything is a vectormiddot class (and struct) allow creation of composite types

classes add behaviour to data to form objectsmiddot Variables need to be declared cannot change

4458

C++ Basics A Better C

C++ is a Better C

middot control structures similar to what R offers for while ifswitch

middot functions are similar too but note the difference inpositional-only matching also same function name butdifferent arguments allowed in C++

middot pointers and memory management very different but lotsof issues people had with C can be avoided via STL (whichis something Rcpp promotes too)

middot sometimes still useful to know what a pointer is hellip

4658

Object-Oriented

This is a second key feature of C++ and it does it differentlyfrom S3 and S4

struct Date unsigned int yearunsigned int monthunsigned int day

struct Person char firstname[20]char lastname[20]struct Date birthdayunsigned long id

4758

Object-Oriented

Object-orientation in the C++ sense matches data with codeoperating on it

class Date private

unsigned int yearunsigned int monthunsigned int date

publicvoid setDate(int y int m int d)int getDay()int getMonth()int getYear()

4858

Generic Programming and the STL

The STL promotes generic programming

For example the sequence container types vector dequeand list all support

middot push_back() to insert at the endmiddot pop_back() to remove from the frontmiddot begin() returning an iterator to the first elementmiddot end() returning an iterator to just after the last elementmiddot size() for the number of elements

but only list has push_front() and pop_front()

Other useful containers set multiset map and multimap4958

Generic Programming and the STL

Traversal of containers can be achieved via iterators whichrequire suitable member functions begin() and end()

stdvectorltdoublegtconst_iterator sifor (si=sbegin() si = send() si++)

stdcout ltlt si ltlt stdendl

5058

Generic Programming and the STL

Another key STL part are algorithms

double sum = accumulate(sbegin() send() 0)

Some other STL algorithms are

middot find finds the first element equal to the supplied valuemiddot count counts the number of matching elementsmiddot transform applies a supplied function to each elementmiddot for_each sweeps over all elements does not altermiddot inner_product inner product of two vectors

5158

Template Programming

Template programming provides a lsquolanguage within C++rsquocode gets evaluated during compilation

One of the simplest template examples is

template lttypename Tgtconst Tamp min(const Tamp x const Tamp y)

return y lt x y x

This can now be used to compute the minimum between twoint variables or double or in fact any admissible typeproviding an operatorlt() for less-than comparison

5258

Template Programming

Another template example is a class squaring its argument

template lttypename Tgtclass square public stdunary_functionltTTgt public

T operator()(T t) const return tt

which can be used along with STL algorithms

transform(xbegin() xend() square)

5358

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 32: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

Speed Example 2 (due to StackOverflow)

R implementation and use

f lt- function(n) if (n lt 2) return(n)return(f(n-1) + f(n-2))

Using it on first 11 argumentssapply(010 f)

[1] 0 1 1 2 3 5 8 13 21 34 55

3158

Speed Example 2 (due to StackOverflow)

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0030 1000 2 f(15) 100 0335 11167 3 f(20) 100 3517 117233

3258

Speed Example 2 (due to StackOverflow)

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

deployed as

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

3358

Speed Example 2 (due to StackOverflow)

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(20) g(20) order=relative)[14]

test replications elapsed relative 2 g(20) 100 0011 1000 1 f(20) 100 3776 343273

A nice gain of a few orders of magnitude3458

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

3558

What Next

Hands-on

Programming with C++

middot C++ Basicsmiddot Debuggingmiddot Best Practices

and then on to Rcpp itself

3758

C++ Basics

Compiled not Interpreted

Need to compile and link

include ltcstdiogt

int main(void) printf(Hello worldn)return 0

3958

Compiled not Interpreted

Or streams output rather than printf

include ltiostreamgt

int main(void) stdcout ltlt Hello world ltlt stdendlreturn 0

4058

Compiled not Interpreted

g++ -o will compile and link

We will now look at an examples with explicit linking

4158

Compiled not Interpreted

include ltcstdiogt

define MATHLIB_STANDALONEinclude ltRmathhgt

int main(void) printf(N(01) 95th percentile 98fn

qnorm(095 00 10 1 0))

4258

Compiled not Interpreted

We may need to supply

middot header location via -Imiddot library location via -Lmiddot library via -llibraryname

g++ -Iusrinclude -c qnorm_rmathcppg++ -o qnorm_rmath qnorm_rmatho -Lusrlib -lRmath

4358

Statically Typed

middot R is dynamically typed x lt- 314 x lt- foo is validmiddot In C++ each variable must be declared before first usemiddot Common types are int and long (possibly with unsigned)float and double bool as well as char

middot No standard string type though stdstring is closemiddot All these variables types are scalars which is fundamentally

different from R where everything is a vectormiddot class (and struct) allow creation of composite types

classes add behaviour to data to form objectsmiddot Variables need to be declared cannot change

4458

C++ Basics A Better C

C++ is a Better C

middot control structures similar to what R offers for while ifswitch

middot functions are similar too but note the difference inpositional-only matching also same function name butdifferent arguments allowed in C++

middot pointers and memory management very different but lotsof issues people had with C can be avoided via STL (whichis something Rcpp promotes too)

middot sometimes still useful to know what a pointer is hellip

4658

Object-Oriented

This is a second key feature of C++ and it does it differentlyfrom S3 and S4

struct Date unsigned int yearunsigned int monthunsigned int day

struct Person char firstname[20]char lastname[20]struct Date birthdayunsigned long id

4758

Object-Oriented

Object-orientation in the C++ sense matches data with codeoperating on it

class Date private

unsigned int yearunsigned int monthunsigned int date

publicvoid setDate(int y int m int d)int getDay()int getMonth()int getYear()

4858

Generic Programming and the STL

The STL promotes generic programming

For example the sequence container types vector dequeand list all support

middot push_back() to insert at the endmiddot pop_back() to remove from the frontmiddot begin() returning an iterator to the first elementmiddot end() returning an iterator to just after the last elementmiddot size() for the number of elements

but only list has push_front() and pop_front()

Other useful containers set multiset map and multimap4958

Generic Programming and the STL

Traversal of containers can be achieved via iterators whichrequire suitable member functions begin() and end()

stdvectorltdoublegtconst_iterator sifor (si=sbegin() si = send() si++)

stdcout ltlt si ltlt stdendl

5058

Generic Programming and the STL

Another key STL part are algorithms

double sum = accumulate(sbegin() send() 0)

Some other STL algorithms are

middot find finds the first element equal to the supplied valuemiddot count counts the number of matching elementsmiddot transform applies a supplied function to each elementmiddot for_each sweeps over all elements does not altermiddot inner_product inner product of two vectors

5158

Template Programming

Template programming provides a lsquolanguage within C++rsquocode gets evaluated during compilation

One of the simplest template examples is

template lttypename Tgtconst Tamp min(const Tamp x const Tamp y)

return y lt x y x

This can now be used to compute the minimum between twoint variables or double or in fact any admissible typeproviding an operatorlt() for less-than comparison

5258

Template Programming

Another template example is a class squaring its argument

template lttypename Tgtclass square public stdunary_functionltTTgt public

T operator()(T t) const return tt

which can be used along with STL algorithms

transform(xbegin() xend() square)

5358

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 33: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

Speed Example 2 (due to StackOverflow)

Timing

library(rbenchmark)benchmark(f(10) f(15) f(20))[14]

test replications elapsed relative 1 f(10) 100 0030 1000 2 f(15) 100 0335 11167 3 f(20) 100 3517 117233

3258

Speed Example 2 (due to StackOverflow)

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

deployed as

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

3358

Speed Example 2 (due to StackOverflow)

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(20) g(20) order=relative)[14]

test replications elapsed relative 2 g(20) 100 0011 1000 1 f(20) 100 3776 343273

A nice gain of a few orders of magnitude3458

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

3558

What Next

Hands-on

Programming with C++

middot C++ Basicsmiddot Debuggingmiddot Best Practices

and then on to Rcpp itself

3758

C++ Basics

Compiled not Interpreted

Need to compile and link

include ltcstdiogt

int main(void) printf(Hello worldn)return 0

3958

Compiled not Interpreted

Or streams output rather than printf

include ltiostreamgt

int main(void) stdcout ltlt Hello world ltlt stdendlreturn 0

4058

Compiled not Interpreted

g++ -o will compile and link

We will now look at an examples with explicit linking

4158

Compiled not Interpreted

include ltcstdiogt

define MATHLIB_STANDALONEinclude ltRmathhgt

int main(void) printf(N(01) 95th percentile 98fn

qnorm(095 00 10 1 0))

4258

Compiled not Interpreted

We may need to supply

middot header location via -Imiddot library location via -Lmiddot library via -llibraryname

g++ -Iusrinclude -c qnorm_rmathcppg++ -o qnorm_rmath qnorm_rmatho -Lusrlib -lRmath

4358

Statically Typed

middot R is dynamically typed x lt- 314 x lt- foo is validmiddot In C++ each variable must be declared before first usemiddot Common types are int and long (possibly with unsigned)float and double bool as well as char

middot No standard string type though stdstring is closemiddot All these variables types are scalars which is fundamentally

different from R where everything is a vectormiddot class (and struct) allow creation of composite types

classes add behaviour to data to form objectsmiddot Variables need to be declared cannot change

4458

C++ Basics A Better C

C++ is a Better C

middot control structures similar to what R offers for while ifswitch

middot functions are similar too but note the difference inpositional-only matching also same function name butdifferent arguments allowed in C++

middot pointers and memory management very different but lotsof issues people had with C can be avoided via STL (whichis something Rcpp promotes too)

middot sometimes still useful to know what a pointer is hellip

4658

Object-Oriented

This is a second key feature of C++ and it does it differentlyfrom S3 and S4

struct Date unsigned int yearunsigned int monthunsigned int day

struct Person char firstname[20]char lastname[20]struct Date birthdayunsigned long id

4758

Object-Oriented

Object-orientation in the C++ sense matches data with codeoperating on it

class Date private

unsigned int yearunsigned int monthunsigned int date

publicvoid setDate(int y int m int d)int getDay()int getMonth()int getYear()

4858

Generic Programming and the STL

The STL promotes generic programming

For example the sequence container types vector dequeand list all support

middot push_back() to insert at the endmiddot pop_back() to remove from the frontmiddot begin() returning an iterator to the first elementmiddot end() returning an iterator to just after the last elementmiddot size() for the number of elements

but only list has push_front() and pop_front()

Other useful containers set multiset map and multimap4958

Generic Programming and the STL

Traversal of containers can be achieved via iterators whichrequire suitable member functions begin() and end()

stdvectorltdoublegtconst_iterator sifor (si=sbegin() si = send() si++)

stdcout ltlt si ltlt stdendl

5058

Generic Programming and the STL

Another key STL part are algorithms

double sum = accumulate(sbegin() send() 0)

Some other STL algorithms are

middot find finds the first element equal to the supplied valuemiddot count counts the number of matching elementsmiddot transform applies a supplied function to each elementmiddot for_each sweeps over all elements does not altermiddot inner_product inner product of two vectors

5158

Template Programming

Template programming provides a lsquolanguage within C++rsquocode gets evaluated during compilation

One of the simplest template examples is

template lttypename Tgtconst Tamp min(const Tamp x const Tamp y)

return y lt x y x

This can now be used to compute the minimum between twoint variables or double or in fact any admissible typeproviding an operatorlt() for less-than comparison

5258

Template Programming

Another template example is a class squaring its argument

template lttypename Tgtclass square public stdunary_functionltTTgt public

T operator()(T t) const return tt

which can be used along with STL algorithms

transform(xbegin() xend() square)

5358

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 34: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

Speed Example 2 (due to StackOverflow)

int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2))

deployed as

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

sapply(010 g)

[1] 0 1 1 2 3 5 8 13 21 34 55

3358

Speed Example 2 (due to StackOverflow)

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(20) g(20) order=relative)[14]

test replications elapsed relative 2 g(20) 100 0011 1000 1 f(20) 100 3776 343273

A nice gain of a few orders of magnitude3458

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

3558

What Next

Hands-on

Programming with C++

middot C++ Basicsmiddot Debuggingmiddot Best Practices

and then on to Rcpp itself

3758

C++ Basics

Compiled not Interpreted

Need to compile and link

include ltcstdiogt

int main(void) printf(Hello worldn)return 0

3958

Compiled not Interpreted

Or streams output rather than printf

include ltiostreamgt

int main(void) stdcout ltlt Hello world ltlt stdendlreturn 0

4058

Compiled not Interpreted

g++ -o will compile and link

We will now look at an examples with explicit linking

4158

Compiled not Interpreted

include ltcstdiogt

define MATHLIB_STANDALONEinclude ltRmathhgt

int main(void) printf(N(01) 95th percentile 98fn

qnorm(095 00 10 1 0))

4258

Compiled not Interpreted

We may need to supply

middot header location via -Imiddot library location via -Lmiddot library via -llibraryname

g++ -Iusrinclude -c qnorm_rmathcppg++ -o qnorm_rmath qnorm_rmatho -Lusrlib -lRmath

4358

Statically Typed

middot R is dynamically typed x lt- 314 x lt- foo is validmiddot In C++ each variable must be declared before first usemiddot Common types are int and long (possibly with unsigned)float and double bool as well as char

middot No standard string type though stdstring is closemiddot All these variables types are scalars which is fundamentally

different from R where everything is a vectormiddot class (and struct) allow creation of composite types

classes add behaviour to data to form objectsmiddot Variables need to be declared cannot change

4458

C++ Basics A Better C

C++ is a Better C

middot control structures similar to what R offers for while ifswitch

middot functions are similar too but note the difference inpositional-only matching also same function name butdifferent arguments allowed in C++

middot pointers and memory management very different but lotsof issues people had with C can be avoided via STL (whichis something Rcpp promotes too)

middot sometimes still useful to know what a pointer is hellip

4658

Object-Oriented

This is a second key feature of C++ and it does it differentlyfrom S3 and S4

struct Date unsigned int yearunsigned int monthunsigned int day

struct Person char firstname[20]char lastname[20]struct Date birthdayunsigned long id

4758

Object-Oriented

Object-orientation in the C++ sense matches data with codeoperating on it

class Date private

unsigned int yearunsigned int monthunsigned int date

publicvoid setDate(int y int m int d)int getDay()int getMonth()int getYear()

4858

Generic Programming and the STL

The STL promotes generic programming

For example the sequence container types vector dequeand list all support

middot push_back() to insert at the endmiddot pop_back() to remove from the frontmiddot begin() returning an iterator to the first elementmiddot end() returning an iterator to just after the last elementmiddot size() for the number of elements

but only list has push_front() and pop_front()

Other useful containers set multiset map and multimap4958

Generic Programming and the STL

Traversal of containers can be achieved via iterators whichrequire suitable member functions begin() and end()

stdvectorltdoublegtconst_iterator sifor (si=sbegin() si = send() si++)

stdcout ltlt si ltlt stdendl

5058

Generic Programming and the STL

Another key STL part are algorithms

double sum = accumulate(sbegin() send() 0)

Some other STL algorithms are

middot find finds the first element equal to the supplied valuemiddot count counts the number of matching elementsmiddot transform applies a supplied function to each elementmiddot for_each sweeps over all elements does not altermiddot inner_product inner product of two vectors

5158

Template Programming

Template programming provides a lsquolanguage within C++rsquocode gets evaluated during compilation

One of the simplest template examples is

template lttypename Tgtconst Tamp min(const Tamp x const Tamp y)

return y lt x y x

This can now be used to compute the minimum between twoint variables or double or in fact any admissible typeproviding an operatorlt() for less-than comparison

5258

Template Programming

Another template example is a class squaring its argument

template lttypename Tgtclass square public stdunary_functionltTTgt public

T operator()(T t) const return tt

which can be used along with STL algorithms

transform(xbegin() xend() square)

5358

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 35: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

Speed Example 2 (due to StackOverflow)

Timing

RcppcppFunction(int g(int n) if (n lt 2) return(n)return(g(n-1) + g(n-2)) )

library(rbenchmark)benchmark(f(20) g(20) order=relative)[14]

test replications elapsed relative 2 g(20) 100 0011 1000 1 f(20) 100 3776 343273

A nice gain of a few orders of magnitude3458

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

3558

What Next

Hands-on

Programming with C++

middot C++ Basicsmiddot Debuggingmiddot Best Practices

and then on to Rcpp itself

3758

C++ Basics

Compiled not Interpreted

Need to compile and link

include ltcstdiogt

int main(void) printf(Hello worldn)return 0

3958

Compiled not Interpreted

Or streams output rather than printf

include ltiostreamgt

int main(void) stdcout ltlt Hello world ltlt stdendlreturn 0

4058

Compiled not Interpreted

g++ -o will compile and link

We will now look at an examples with explicit linking

4158

Compiled not Interpreted

include ltcstdiogt

define MATHLIB_STANDALONEinclude ltRmathhgt

int main(void) printf(N(01) 95th percentile 98fn

qnorm(095 00 10 1 0))

4258

Compiled not Interpreted

We may need to supply

middot header location via -Imiddot library location via -Lmiddot library via -llibraryname

g++ -Iusrinclude -c qnorm_rmathcppg++ -o qnorm_rmath qnorm_rmatho -Lusrlib -lRmath

4358

Statically Typed

middot R is dynamically typed x lt- 314 x lt- foo is validmiddot In C++ each variable must be declared before first usemiddot Common types are int and long (possibly with unsigned)float and double bool as well as char

middot No standard string type though stdstring is closemiddot All these variables types are scalars which is fundamentally

different from R where everything is a vectormiddot class (and struct) allow creation of composite types

classes add behaviour to data to form objectsmiddot Variables need to be declared cannot change

4458

C++ Basics A Better C

C++ is a Better C

middot control structures similar to what R offers for while ifswitch

middot functions are similar too but note the difference inpositional-only matching also same function name butdifferent arguments allowed in C++

middot pointers and memory management very different but lotsof issues people had with C can be avoided via STL (whichis something Rcpp promotes too)

middot sometimes still useful to know what a pointer is hellip

4658

Object-Oriented

This is a second key feature of C++ and it does it differentlyfrom S3 and S4

struct Date unsigned int yearunsigned int monthunsigned int day

struct Person char firstname[20]char lastname[20]struct Date birthdayunsigned long id

4758

Object-Oriented

Object-orientation in the C++ sense matches data with codeoperating on it

class Date private

unsigned int yearunsigned int monthunsigned int date

publicvoid setDate(int y int m int d)int getDay()int getMonth()int getYear()

4858

Generic Programming and the STL

The STL promotes generic programming

For example the sequence container types vector dequeand list all support

middot push_back() to insert at the endmiddot pop_back() to remove from the frontmiddot begin() returning an iterator to the first elementmiddot end() returning an iterator to just after the last elementmiddot size() for the number of elements

but only list has push_front() and pop_front()

Other useful containers set multiset map and multimap4958

Generic Programming and the STL

Traversal of containers can be achieved via iterators whichrequire suitable member functions begin() and end()

stdvectorltdoublegtconst_iterator sifor (si=sbegin() si = send() si++)

stdcout ltlt si ltlt stdendl

5058

Generic Programming and the STL

Another key STL part are algorithms

double sum = accumulate(sbegin() send() 0)

Some other STL algorithms are

middot find finds the first element equal to the supplied valuemiddot count counts the number of matching elementsmiddot transform applies a supplied function to each elementmiddot for_each sweeps over all elements does not altermiddot inner_product inner product of two vectors

5158

Template Programming

Template programming provides a lsquolanguage within C++rsquocode gets evaluated during compilation

One of the simplest template examples is

template lttypename Tgtconst Tamp min(const Tamp x const Tamp y)

return y lt x y x

This can now be used to compute the minimum between twoint variables or double or in fact any admissible typeproviding an operatorlt() for less-than comparison

5258

Template Programming

Another template example is a class squaring its argument

template lttypename Tgtclass square public stdunary_functionltTTgt public

T operator()(T t) const return tt

which can be used along with STL algorithms

transform(xbegin() xend() square)

5358

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 36: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

Another Angle on Speed

Run-time performance is just one example

Time to code is another metric

We feel quite strongly that helps you code more succinctlyleading to fewer bugs and faster development

A good environment helps RStudio integrates R and C++development quite nicely (eg the compiler error messageparsing is very helpful) and also helps with package building

3558

What Next

Hands-on

Programming with C++

middot C++ Basicsmiddot Debuggingmiddot Best Practices

and then on to Rcpp itself

3758

C++ Basics

Compiled not Interpreted

Need to compile and link

include ltcstdiogt

int main(void) printf(Hello worldn)return 0

3958

Compiled not Interpreted

Or streams output rather than printf

include ltiostreamgt

int main(void) stdcout ltlt Hello world ltlt stdendlreturn 0

4058

Compiled not Interpreted

g++ -o will compile and link

We will now look at an examples with explicit linking

4158

Compiled not Interpreted

include ltcstdiogt

define MATHLIB_STANDALONEinclude ltRmathhgt

int main(void) printf(N(01) 95th percentile 98fn

qnorm(095 00 10 1 0))

4258

Compiled not Interpreted

We may need to supply

middot header location via -Imiddot library location via -Lmiddot library via -llibraryname

g++ -Iusrinclude -c qnorm_rmathcppg++ -o qnorm_rmath qnorm_rmatho -Lusrlib -lRmath

4358

Statically Typed

middot R is dynamically typed x lt- 314 x lt- foo is validmiddot In C++ each variable must be declared before first usemiddot Common types are int and long (possibly with unsigned)float and double bool as well as char

middot No standard string type though stdstring is closemiddot All these variables types are scalars which is fundamentally

different from R where everything is a vectormiddot class (and struct) allow creation of composite types

classes add behaviour to data to form objectsmiddot Variables need to be declared cannot change

4458

C++ Basics A Better C

C++ is a Better C

middot control structures similar to what R offers for while ifswitch

middot functions are similar too but note the difference inpositional-only matching also same function name butdifferent arguments allowed in C++

middot pointers and memory management very different but lotsof issues people had with C can be avoided via STL (whichis something Rcpp promotes too)

middot sometimes still useful to know what a pointer is hellip

4658

Object-Oriented

This is a second key feature of C++ and it does it differentlyfrom S3 and S4

struct Date unsigned int yearunsigned int monthunsigned int day

struct Person char firstname[20]char lastname[20]struct Date birthdayunsigned long id

4758

Object-Oriented

Object-orientation in the C++ sense matches data with codeoperating on it

class Date private

unsigned int yearunsigned int monthunsigned int date

publicvoid setDate(int y int m int d)int getDay()int getMonth()int getYear()

4858

Generic Programming and the STL

The STL promotes generic programming

For example the sequence container types vector dequeand list all support

middot push_back() to insert at the endmiddot pop_back() to remove from the frontmiddot begin() returning an iterator to the first elementmiddot end() returning an iterator to just after the last elementmiddot size() for the number of elements

but only list has push_front() and pop_front()

Other useful containers set multiset map and multimap4958

Generic Programming and the STL

Traversal of containers can be achieved via iterators whichrequire suitable member functions begin() and end()

stdvectorltdoublegtconst_iterator sifor (si=sbegin() si = send() si++)

stdcout ltlt si ltlt stdendl

5058

Generic Programming and the STL

Another key STL part are algorithms

double sum = accumulate(sbegin() send() 0)

Some other STL algorithms are

middot find finds the first element equal to the supplied valuemiddot count counts the number of matching elementsmiddot transform applies a supplied function to each elementmiddot for_each sweeps over all elements does not altermiddot inner_product inner product of two vectors

5158

Template Programming

Template programming provides a lsquolanguage within C++rsquocode gets evaluated during compilation

One of the simplest template examples is

template lttypename Tgtconst Tamp min(const Tamp x const Tamp y)

return y lt x y x

This can now be used to compute the minimum between twoint variables or double or in fact any admissible typeproviding an operatorlt() for less-than comparison

5258

Template Programming

Another template example is a class squaring its argument

template lttypename Tgtclass square public stdunary_functionltTTgt public

T operator()(T t) const return tt

which can be used along with STL algorithms

transform(xbegin() xend() square)

5358

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 37: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

What Next

Hands-on

Programming with C++

middot C++ Basicsmiddot Debuggingmiddot Best Practices

and then on to Rcpp itself

3758

C++ Basics

Compiled not Interpreted

Need to compile and link

include ltcstdiogt

int main(void) printf(Hello worldn)return 0

3958

Compiled not Interpreted

Or streams output rather than printf

include ltiostreamgt

int main(void) stdcout ltlt Hello world ltlt stdendlreturn 0

4058

Compiled not Interpreted

g++ -o will compile and link

We will now look at an examples with explicit linking

4158

Compiled not Interpreted

include ltcstdiogt

define MATHLIB_STANDALONEinclude ltRmathhgt

int main(void) printf(N(01) 95th percentile 98fn

qnorm(095 00 10 1 0))

4258

Compiled not Interpreted

We may need to supply

middot header location via -Imiddot library location via -Lmiddot library via -llibraryname

g++ -Iusrinclude -c qnorm_rmathcppg++ -o qnorm_rmath qnorm_rmatho -Lusrlib -lRmath

4358

Statically Typed

middot R is dynamically typed x lt- 314 x lt- foo is validmiddot In C++ each variable must be declared before first usemiddot Common types are int and long (possibly with unsigned)float and double bool as well as char

middot No standard string type though stdstring is closemiddot All these variables types are scalars which is fundamentally

different from R where everything is a vectormiddot class (and struct) allow creation of composite types

classes add behaviour to data to form objectsmiddot Variables need to be declared cannot change

4458

C++ Basics A Better C

C++ is a Better C

middot control structures similar to what R offers for while ifswitch

middot functions are similar too but note the difference inpositional-only matching also same function name butdifferent arguments allowed in C++

middot pointers and memory management very different but lotsof issues people had with C can be avoided via STL (whichis something Rcpp promotes too)

middot sometimes still useful to know what a pointer is hellip

4658

Object-Oriented

This is a second key feature of C++ and it does it differentlyfrom S3 and S4

struct Date unsigned int yearunsigned int monthunsigned int day

struct Person char firstname[20]char lastname[20]struct Date birthdayunsigned long id

4758

Object-Oriented

Object-orientation in the C++ sense matches data with codeoperating on it

class Date private

unsigned int yearunsigned int monthunsigned int date

publicvoid setDate(int y int m int d)int getDay()int getMonth()int getYear()

4858

Generic Programming and the STL

The STL promotes generic programming

For example the sequence container types vector dequeand list all support

middot push_back() to insert at the endmiddot pop_back() to remove from the frontmiddot begin() returning an iterator to the first elementmiddot end() returning an iterator to just after the last elementmiddot size() for the number of elements

but only list has push_front() and pop_front()

Other useful containers set multiset map and multimap4958

Generic Programming and the STL

Traversal of containers can be achieved via iterators whichrequire suitable member functions begin() and end()

stdvectorltdoublegtconst_iterator sifor (si=sbegin() si = send() si++)

stdcout ltlt si ltlt stdendl

5058

Generic Programming and the STL

Another key STL part are algorithms

double sum = accumulate(sbegin() send() 0)

Some other STL algorithms are

middot find finds the first element equal to the supplied valuemiddot count counts the number of matching elementsmiddot transform applies a supplied function to each elementmiddot for_each sweeps over all elements does not altermiddot inner_product inner product of two vectors

5158

Template Programming

Template programming provides a lsquolanguage within C++rsquocode gets evaluated during compilation

One of the simplest template examples is

template lttypename Tgtconst Tamp min(const Tamp x const Tamp y)

return y lt x y x

This can now be used to compute the minimum between twoint variables or double or in fact any admissible typeproviding an operatorlt() for less-than comparison

5258

Template Programming

Another template example is a class squaring its argument

template lttypename Tgtclass square public stdunary_functionltTTgt public

T operator()(T t) const return tt

which can be used along with STL algorithms

transform(xbegin() xend() square)

5358

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 38: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

Hands-on

Programming with C++

middot C++ Basicsmiddot Debuggingmiddot Best Practices

and then on to Rcpp itself

3758

C++ Basics

Compiled not Interpreted

Need to compile and link

include ltcstdiogt

int main(void) printf(Hello worldn)return 0

3958

Compiled not Interpreted

Or streams output rather than printf

include ltiostreamgt

int main(void) stdcout ltlt Hello world ltlt stdendlreturn 0

4058

Compiled not Interpreted

g++ -o will compile and link

We will now look at an examples with explicit linking

4158

Compiled not Interpreted

include ltcstdiogt

define MATHLIB_STANDALONEinclude ltRmathhgt

int main(void) printf(N(01) 95th percentile 98fn

qnorm(095 00 10 1 0))

4258

Compiled not Interpreted

We may need to supply

middot header location via -Imiddot library location via -Lmiddot library via -llibraryname

g++ -Iusrinclude -c qnorm_rmathcppg++ -o qnorm_rmath qnorm_rmatho -Lusrlib -lRmath

4358

Statically Typed

middot R is dynamically typed x lt- 314 x lt- foo is validmiddot In C++ each variable must be declared before first usemiddot Common types are int and long (possibly with unsigned)float and double bool as well as char

middot No standard string type though stdstring is closemiddot All these variables types are scalars which is fundamentally

different from R where everything is a vectormiddot class (and struct) allow creation of composite types

classes add behaviour to data to form objectsmiddot Variables need to be declared cannot change

4458

C++ Basics A Better C

C++ is a Better C

middot control structures similar to what R offers for while ifswitch

middot functions are similar too but note the difference inpositional-only matching also same function name butdifferent arguments allowed in C++

middot pointers and memory management very different but lotsof issues people had with C can be avoided via STL (whichis something Rcpp promotes too)

middot sometimes still useful to know what a pointer is hellip

4658

Object-Oriented

This is a second key feature of C++ and it does it differentlyfrom S3 and S4

struct Date unsigned int yearunsigned int monthunsigned int day

struct Person char firstname[20]char lastname[20]struct Date birthdayunsigned long id

4758

Object-Oriented

Object-orientation in the C++ sense matches data with codeoperating on it

class Date private

unsigned int yearunsigned int monthunsigned int date

publicvoid setDate(int y int m int d)int getDay()int getMonth()int getYear()

4858

Generic Programming and the STL

The STL promotes generic programming

For example the sequence container types vector dequeand list all support

middot push_back() to insert at the endmiddot pop_back() to remove from the frontmiddot begin() returning an iterator to the first elementmiddot end() returning an iterator to just after the last elementmiddot size() for the number of elements

but only list has push_front() and pop_front()

Other useful containers set multiset map and multimap4958

Generic Programming and the STL

Traversal of containers can be achieved via iterators whichrequire suitable member functions begin() and end()

stdvectorltdoublegtconst_iterator sifor (si=sbegin() si = send() si++)

stdcout ltlt si ltlt stdendl

5058

Generic Programming and the STL

Another key STL part are algorithms

double sum = accumulate(sbegin() send() 0)

Some other STL algorithms are

middot find finds the first element equal to the supplied valuemiddot count counts the number of matching elementsmiddot transform applies a supplied function to each elementmiddot for_each sweeps over all elements does not altermiddot inner_product inner product of two vectors

5158

Template Programming

Template programming provides a lsquolanguage within C++rsquocode gets evaluated during compilation

One of the simplest template examples is

template lttypename Tgtconst Tamp min(const Tamp x const Tamp y)

return y lt x y x

This can now be used to compute the minimum between twoint variables or double or in fact any admissible typeproviding an operatorlt() for less-than comparison

5258

Template Programming

Another template example is a class squaring its argument

template lttypename Tgtclass square public stdunary_functionltTTgt public

T operator()(T t) const return tt

which can be used along with STL algorithms

transform(xbegin() xend() square)

5358

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 39: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

C++ Basics

Compiled not Interpreted

Need to compile and link

include ltcstdiogt

int main(void) printf(Hello worldn)return 0

3958

Compiled not Interpreted

Or streams output rather than printf

include ltiostreamgt

int main(void) stdcout ltlt Hello world ltlt stdendlreturn 0

4058

Compiled not Interpreted

g++ -o will compile and link

We will now look at an examples with explicit linking

4158

Compiled not Interpreted

include ltcstdiogt

define MATHLIB_STANDALONEinclude ltRmathhgt

int main(void) printf(N(01) 95th percentile 98fn

qnorm(095 00 10 1 0))

4258

Compiled not Interpreted

We may need to supply

middot header location via -Imiddot library location via -Lmiddot library via -llibraryname

g++ -Iusrinclude -c qnorm_rmathcppg++ -o qnorm_rmath qnorm_rmatho -Lusrlib -lRmath

4358

Statically Typed

middot R is dynamically typed x lt- 314 x lt- foo is validmiddot In C++ each variable must be declared before first usemiddot Common types are int and long (possibly with unsigned)float and double bool as well as char

middot No standard string type though stdstring is closemiddot All these variables types are scalars which is fundamentally

different from R where everything is a vectormiddot class (and struct) allow creation of composite types

classes add behaviour to data to form objectsmiddot Variables need to be declared cannot change

4458

C++ Basics A Better C

C++ is a Better C

middot control structures similar to what R offers for while ifswitch

middot functions are similar too but note the difference inpositional-only matching also same function name butdifferent arguments allowed in C++

middot pointers and memory management very different but lotsof issues people had with C can be avoided via STL (whichis something Rcpp promotes too)

middot sometimes still useful to know what a pointer is hellip

4658

Object-Oriented

This is a second key feature of C++ and it does it differentlyfrom S3 and S4

struct Date unsigned int yearunsigned int monthunsigned int day

struct Person char firstname[20]char lastname[20]struct Date birthdayunsigned long id

4758

Object-Oriented

Object-orientation in the C++ sense matches data with codeoperating on it

class Date private

unsigned int yearunsigned int monthunsigned int date

publicvoid setDate(int y int m int d)int getDay()int getMonth()int getYear()

4858

Generic Programming and the STL

The STL promotes generic programming

For example the sequence container types vector dequeand list all support

middot push_back() to insert at the endmiddot pop_back() to remove from the frontmiddot begin() returning an iterator to the first elementmiddot end() returning an iterator to just after the last elementmiddot size() for the number of elements

but only list has push_front() and pop_front()

Other useful containers set multiset map and multimap4958

Generic Programming and the STL

Traversal of containers can be achieved via iterators whichrequire suitable member functions begin() and end()

stdvectorltdoublegtconst_iterator sifor (si=sbegin() si = send() si++)

stdcout ltlt si ltlt stdendl

5058

Generic Programming and the STL

Another key STL part are algorithms

double sum = accumulate(sbegin() send() 0)

Some other STL algorithms are

middot find finds the first element equal to the supplied valuemiddot count counts the number of matching elementsmiddot transform applies a supplied function to each elementmiddot for_each sweeps over all elements does not altermiddot inner_product inner product of two vectors

5158

Template Programming

Template programming provides a lsquolanguage within C++rsquocode gets evaluated during compilation

One of the simplest template examples is

template lttypename Tgtconst Tamp min(const Tamp x const Tamp y)

return y lt x y x

This can now be used to compute the minimum between twoint variables or double or in fact any admissible typeproviding an operatorlt() for less-than comparison

5258

Template Programming

Another template example is a class squaring its argument

template lttypename Tgtclass square public stdunary_functionltTTgt public

T operator()(T t) const return tt

which can be used along with STL algorithms

transform(xbegin() xend() square)

5358

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 40: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

Compiled not Interpreted

Need to compile and link

include ltcstdiogt

int main(void) printf(Hello worldn)return 0

3958

Compiled not Interpreted

Or streams output rather than printf

include ltiostreamgt

int main(void) stdcout ltlt Hello world ltlt stdendlreturn 0

4058

Compiled not Interpreted

g++ -o will compile and link

We will now look at an examples with explicit linking

4158

Compiled not Interpreted

include ltcstdiogt

define MATHLIB_STANDALONEinclude ltRmathhgt

int main(void) printf(N(01) 95th percentile 98fn

qnorm(095 00 10 1 0))

4258

Compiled not Interpreted

We may need to supply

middot header location via -Imiddot library location via -Lmiddot library via -llibraryname

g++ -Iusrinclude -c qnorm_rmathcppg++ -o qnorm_rmath qnorm_rmatho -Lusrlib -lRmath

4358

Statically Typed

middot R is dynamically typed x lt- 314 x lt- foo is validmiddot In C++ each variable must be declared before first usemiddot Common types are int and long (possibly with unsigned)float and double bool as well as char

middot No standard string type though stdstring is closemiddot All these variables types are scalars which is fundamentally

different from R where everything is a vectormiddot class (and struct) allow creation of composite types

classes add behaviour to data to form objectsmiddot Variables need to be declared cannot change

4458

C++ Basics A Better C

C++ is a Better C

middot control structures similar to what R offers for while ifswitch

middot functions are similar too but note the difference inpositional-only matching also same function name butdifferent arguments allowed in C++

middot pointers and memory management very different but lotsof issues people had with C can be avoided via STL (whichis something Rcpp promotes too)

middot sometimes still useful to know what a pointer is hellip

4658

Object-Oriented

This is a second key feature of C++ and it does it differentlyfrom S3 and S4

struct Date unsigned int yearunsigned int monthunsigned int day

struct Person char firstname[20]char lastname[20]struct Date birthdayunsigned long id

4758

Object-Oriented

Object-orientation in the C++ sense matches data with codeoperating on it

class Date private

unsigned int yearunsigned int monthunsigned int date

publicvoid setDate(int y int m int d)int getDay()int getMonth()int getYear()

4858

Generic Programming and the STL

The STL promotes generic programming

For example the sequence container types vector dequeand list all support

middot push_back() to insert at the endmiddot pop_back() to remove from the frontmiddot begin() returning an iterator to the first elementmiddot end() returning an iterator to just after the last elementmiddot size() for the number of elements

but only list has push_front() and pop_front()

Other useful containers set multiset map and multimap4958

Generic Programming and the STL

Traversal of containers can be achieved via iterators whichrequire suitable member functions begin() and end()

stdvectorltdoublegtconst_iterator sifor (si=sbegin() si = send() si++)

stdcout ltlt si ltlt stdendl

5058

Generic Programming and the STL

Another key STL part are algorithms

double sum = accumulate(sbegin() send() 0)

Some other STL algorithms are

middot find finds the first element equal to the supplied valuemiddot count counts the number of matching elementsmiddot transform applies a supplied function to each elementmiddot for_each sweeps over all elements does not altermiddot inner_product inner product of two vectors

5158

Template Programming

Template programming provides a lsquolanguage within C++rsquocode gets evaluated during compilation

One of the simplest template examples is

template lttypename Tgtconst Tamp min(const Tamp x const Tamp y)

return y lt x y x

This can now be used to compute the minimum between twoint variables or double or in fact any admissible typeproviding an operatorlt() for less-than comparison

5258

Template Programming

Another template example is a class squaring its argument

template lttypename Tgtclass square public stdunary_functionltTTgt public

T operator()(T t) const return tt

which can be used along with STL algorithms

transform(xbegin() xend() square)

5358

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 41: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

Compiled not Interpreted

Or streams output rather than printf

include ltiostreamgt

int main(void) stdcout ltlt Hello world ltlt stdendlreturn 0

4058

Compiled not Interpreted

g++ -o will compile and link

We will now look at an examples with explicit linking

4158

Compiled not Interpreted

include ltcstdiogt

define MATHLIB_STANDALONEinclude ltRmathhgt

int main(void) printf(N(01) 95th percentile 98fn

qnorm(095 00 10 1 0))

4258

Compiled not Interpreted

We may need to supply

middot header location via -Imiddot library location via -Lmiddot library via -llibraryname

g++ -Iusrinclude -c qnorm_rmathcppg++ -o qnorm_rmath qnorm_rmatho -Lusrlib -lRmath

4358

Statically Typed

middot R is dynamically typed x lt- 314 x lt- foo is validmiddot In C++ each variable must be declared before first usemiddot Common types are int and long (possibly with unsigned)float and double bool as well as char

middot No standard string type though stdstring is closemiddot All these variables types are scalars which is fundamentally

different from R where everything is a vectormiddot class (and struct) allow creation of composite types

classes add behaviour to data to form objectsmiddot Variables need to be declared cannot change

4458

C++ Basics A Better C

C++ is a Better C

middot control structures similar to what R offers for while ifswitch

middot functions are similar too but note the difference inpositional-only matching also same function name butdifferent arguments allowed in C++

middot pointers and memory management very different but lotsof issues people had with C can be avoided via STL (whichis something Rcpp promotes too)

middot sometimes still useful to know what a pointer is hellip

4658

Object-Oriented

This is a second key feature of C++ and it does it differentlyfrom S3 and S4

struct Date unsigned int yearunsigned int monthunsigned int day

struct Person char firstname[20]char lastname[20]struct Date birthdayunsigned long id

4758

Object-Oriented

Object-orientation in the C++ sense matches data with codeoperating on it

class Date private

unsigned int yearunsigned int monthunsigned int date

publicvoid setDate(int y int m int d)int getDay()int getMonth()int getYear()

4858

Generic Programming and the STL

The STL promotes generic programming

For example the sequence container types vector dequeand list all support

middot push_back() to insert at the endmiddot pop_back() to remove from the frontmiddot begin() returning an iterator to the first elementmiddot end() returning an iterator to just after the last elementmiddot size() for the number of elements

but only list has push_front() and pop_front()

Other useful containers set multiset map and multimap4958

Generic Programming and the STL

Traversal of containers can be achieved via iterators whichrequire suitable member functions begin() and end()

stdvectorltdoublegtconst_iterator sifor (si=sbegin() si = send() si++)

stdcout ltlt si ltlt stdendl

5058

Generic Programming and the STL

Another key STL part are algorithms

double sum = accumulate(sbegin() send() 0)

Some other STL algorithms are

middot find finds the first element equal to the supplied valuemiddot count counts the number of matching elementsmiddot transform applies a supplied function to each elementmiddot for_each sweeps over all elements does not altermiddot inner_product inner product of two vectors

5158

Template Programming

Template programming provides a lsquolanguage within C++rsquocode gets evaluated during compilation

One of the simplest template examples is

template lttypename Tgtconst Tamp min(const Tamp x const Tamp y)

return y lt x y x

This can now be used to compute the minimum between twoint variables or double or in fact any admissible typeproviding an operatorlt() for less-than comparison

5258

Template Programming

Another template example is a class squaring its argument

template lttypename Tgtclass square public stdunary_functionltTTgt public

T operator()(T t) const return tt

which can be used along with STL algorithms

transform(xbegin() xend() square)

5358

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 42: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

Compiled not Interpreted

g++ -o will compile and link

We will now look at an examples with explicit linking

4158

Compiled not Interpreted

include ltcstdiogt

define MATHLIB_STANDALONEinclude ltRmathhgt

int main(void) printf(N(01) 95th percentile 98fn

qnorm(095 00 10 1 0))

4258

Compiled not Interpreted

We may need to supply

middot header location via -Imiddot library location via -Lmiddot library via -llibraryname

g++ -Iusrinclude -c qnorm_rmathcppg++ -o qnorm_rmath qnorm_rmatho -Lusrlib -lRmath

4358

Statically Typed

middot R is dynamically typed x lt- 314 x lt- foo is validmiddot In C++ each variable must be declared before first usemiddot Common types are int and long (possibly with unsigned)float and double bool as well as char

middot No standard string type though stdstring is closemiddot All these variables types are scalars which is fundamentally

different from R where everything is a vectormiddot class (and struct) allow creation of composite types

classes add behaviour to data to form objectsmiddot Variables need to be declared cannot change

4458

C++ Basics A Better C

C++ is a Better C

middot control structures similar to what R offers for while ifswitch

middot functions are similar too but note the difference inpositional-only matching also same function name butdifferent arguments allowed in C++

middot pointers and memory management very different but lotsof issues people had with C can be avoided via STL (whichis something Rcpp promotes too)

middot sometimes still useful to know what a pointer is hellip

4658

Object-Oriented

This is a second key feature of C++ and it does it differentlyfrom S3 and S4

struct Date unsigned int yearunsigned int monthunsigned int day

struct Person char firstname[20]char lastname[20]struct Date birthdayunsigned long id

4758

Object-Oriented

Object-orientation in the C++ sense matches data with codeoperating on it

class Date private

unsigned int yearunsigned int monthunsigned int date

publicvoid setDate(int y int m int d)int getDay()int getMonth()int getYear()

4858

Generic Programming and the STL

The STL promotes generic programming

For example the sequence container types vector dequeand list all support

middot push_back() to insert at the endmiddot pop_back() to remove from the frontmiddot begin() returning an iterator to the first elementmiddot end() returning an iterator to just after the last elementmiddot size() for the number of elements

but only list has push_front() and pop_front()

Other useful containers set multiset map and multimap4958

Generic Programming and the STL

Traversal of containers can be achieved via iterators whichrequire suitable member functions begin() and end()

stdvectorltdoublegtconst_iterator sifor (si=sbegin() si = send() si++)

stdcout ltlt si ltlt stdendl

5058

Generic Programming and the STL

Another key STL part are algorithms

double sum = accumulate(sbegin() send() 0)

Some other STL algorithms are

middot find finds the first element equal to the supplied valuemiddot count counts the number of matching elementsmiddot transform applies a supplied function to each elementmiddot for_each sweeps over all elements does not altermiddot inner_product inner product of two vectors

5158

Template Programming

Template programming provides a lsquolanguage within C++rsquocode gets evaluated during compilation

One of the simplest template examples is

template lttypename Tgtconst Tamp min(const Tamp x const Tamp y)

return y lt x y x

This can now be used to compute the minimum between twoint variables or double or in fact any admissible typeproviding an operatorlt() for less-than comparison

5258

Template Programming

Another template example is a class squaring its argument

template lttypename Tgtclass square public stdunary_functionltTTgt public

T operator()(T t) const return tt

which can be used along with STL algorithms

transform(xbegin() xend() square)

5358

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 43: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

Compiled not Interpreted

include ltcstdiogt

define MATHLIB_STANDALONEinclude ltRmathhgt

int main(void) printf(N(01) 95th percentile 98fn

qnorm(095 00 10 1 0))

4258

Compiled not Interpreted

We may need to supply

middot header location via -Imiddot library location via -Lmiddot library via -llibraryname

g++ -Iusrinclude -c qnorm_rmathcppg++ -o qnorm_rmath qnorm_rmatho -Lusrlib -lRmath

4358

Statically Typed

middot R is dynamically typed x lt- 314 x lt- foo is validmiddot In C++ each variable must be declared before first usemiddot Common types are int and long (possibly with unsigned)float and double bool as well as char

middot No standard string type though stdstring is closemiddot All these variables types are scalars which is fundamentally

different from R where everything is a vectormiddot class (and struct) allow creation of composite types

classes add behaviour to data to form objectsmiddot Variables need to be declared cannot change

4458

C++ Basics A Better C

C++ is a Better C

middot control structures similar to what R offers for while ifswitch

middot functions are similar too but note the difference inpositional-only matching also same function name butdifferent arguments allowed in C++

middot pointers and memory management very different but lotsof issues people had with C can be avoided via STL (whichis something Rcpp promotes too)

middot sometimes still useful to know what a pointer is hellip

4658

Object-Oriented

This is a second key feature of C++ and it does it differentlyfrom S3 and S4

struct Date unsigned int yearunsigned int monthunsigned int day

struct Person char firstname[20]char lastname[20]struct Date birthdayunsigned long id

4758

Object-Oriented

Object-orientation in the C++ sense matches data with codeoperating on it

class Date private

unsigned int yearunsigned int monthunsigned int date

publicvoid setDate(int y int m int d)int getDay()int getMonth()int getYear()

4858

Generic Programming and the STL

The STL promotes generic programming

For example the sequence container types vector dequeand list all support

middot push_back() to insert at the endmiddot pop_back() to remove from the frontmiddot begin() returning an iterator to the first elementmiddot end() returning an iterator to just after the last elementmiddot size() for the number of elements

but only list has push_front() and pop_front()

Other useful containers set multiset map and multimap4958

Generic Programming and the STL

Traversal of containers can be achieved via iterators whichrequire suitable member functions begin() and end()

stdvectorltdoublegtconst_iterator sifor (si=sbegin() si = send() si++)

stdcout ltlt si ltlt stdendl

5058

Generic Programming and the STL

Another key STL part are algorithms

double sum = accumulate(sbegin() send() 0)

Some other STL algorithms are

middot find finds the first element equal to the supplied valuemiddot count counts the number of matching elementsmiddot transform applies a supplied function to each elementmiddot for_each sweeps over all elements does not altermiddot inner_product inner product of two vectors

5158

Template Programming

Template programming provides a lsquolanguage within C++rsquocode gets evaluated during compilation

One of the simplest template examples is

template lttypename Tgtconst Tamp min(const Tamp x const Tamp y)

return y lt x y x

This can now be used to compute the minimum between twoint variables or double or in fact any admissible typeproviding an operatorlt() for less-than comparison

5258

Template Programming

Another template example is a class squaring its argument

template lttypename Tgtclass square public stdunary_functionltTTgt public

T operator()(T t) const return tt

which can be used along with STL algorithms

transform(xbegin() xend() square)

5358

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 44: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

Compiled not Interpreted

We may need to supply

middot header location via -Imiddot library location via -Lmiddot library via -llibraryname

g++ -Iusrinclude -c qnorm_rmathcppg++ -o qnorm_rmath qnorm_rmatho -Lusrlib -lRmath

4358

Statically Typed

middot R is dynamically typed x lt- 314 x lt- foo is validmiddot In C++ each variable must be declared before first usemiddot Common types are int and long (possibly with unsigned)float and double bool as well as char

middot No standard string type though stdstring is closemiddot All these variables types are scalars which is fundamentally

different from R where everything is a vectormiddot class (and struct) allow creation of composite types

classes add behaviour to data to form objectsmiddot Variables need to be declared cannot change

4458

C++ Basics A Better C

C++ is a Better C

middot control structures similar to what R offers for while ifswitch

middot functions are similar too but note the difference inpositional-only matching also same function name butdifferent arguments allowed in C++

middot pointers and memory management very different but lotsof issues people had with C can be avoided via STL (whichis something Rcpp promotes too)

middot sometimes still useful to know what a pointer is hellip

4658

Object-Oriented

This is a second key feature of C++ and it does it differentlyfrom S3 and S4

struct Date unsigned int yearunsigned int monthunsigned int day

struct Person char firstname[20]char lastname[20]struct Date birthdayunsigned long id

4758

Object-Oriented

Object-orientation in the C++ sense matches data with codeoperating on it

class Date private

unsigned int yearunsigned int monthunsigned int date

publicvoid setDate(int y int m int d)int getDay()int getMonth()int getYear()

4858

Generic Programming and the STL

The STL promotes generic programming

For example the sequence container types vector dequeand list all support

middot push_back() to insert at the endmiddot pop_back() to remove from the frontmiddot begin() returning an iterator to the first elementmiddot end() returning an iterator to just after the last elementmiddot size() for the number of elements

but only list has push_front() and pop_front()

Other useful containers set multiset map and multimap4958

Generic Programming and the STL

Traversal of containers can be achieved via iterators whichrequire suitable member functions begin() and end()

stdvectorltdoublegtconst_iterator sifor (si=sbegin() si = send() si++)

stdcout ltlt si ltlt stdendl

5058

Generic Programming and the STL

Another key STL part are algorithms

double sum = accumulate(sbegin() send() 0)

Some other STL algorithms are

middot find finds the first element equal to the supplied valuemiddot count counts the number of matching elementsmiddot transform applies a supplied function to each elementmiddot for_each sweeps over all elements does not altermiddot inner_product inner product of two vectors

5158

Template Programming

Template programming provides a lsquolanguage within C++rsquocode gets evaluated during compilation

One of the simplest template examples is

template lttypename Tgtconst Tamp min(const Tamp x const Tamp y)

return y lt x y x

This can now be used to compute the minimum between twoint variables or double or in fact any admissible typeproviding an operatorlt() for less-than comparison

5258

Template Programming

Another template example is a class squaring its argument

template lttypename Tgtclass square public stdunary_functionltTTgt public

T operator()(T t) const return tt

which can be used along with STL algorithms

transform(xbegin() xend() square)

5358

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 45: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

Statically Typed

middot R is dynamically typed x lt- 314 x lt- foo is validmiddot In C++ each variable must be declared before first usemiddot Common types are int and long (possibly with unsigned)float and double bool as well as char

middot No standard string type though stdstring is closemiddot All these variables types are scalars which is fundamentally

different from R where everything is a vectormiddot class (and struct) allow creation of composite types

classes add behaviour to data to form objectsmiddot Variables need to be declared cannot change

4458

C++ Basics A Better C

C++ is a Better C

middot control structures similar to what R offers for while ifswitch

middot functions are similar too but note the difference inpositional-only matching also same function name butdifferent arguments allowed in C++

middot pointers and memory management very different but lotsof issues people had with C can be avoided via STL (whichis something Rcpp promotes too)

middot sometimes still useful to know what a pointer is hellip

4658

Object-Oriented

This is a second key feature of C++ and it does it differentlyfrom S3 and S4

struct Date unsigned int yearunsigned int monthunsigned int day

struct Person char firstname[20]char lastname[20]struct Date birthdayunsigned long id

4758

Object-Oriented

Object-orientation in the C++ sense matches data with codeoperating on it

class Date private

unsigned int yearunsigned int monthunsigned int date

publicvoid setDate(int y int m int d)int getDay()int getMonth()int getYear()

4858

Generic Programming and the STL

The STL promotes generic programming

For example the sequence container types vector dequeand list all support

middot push_back() to insert at the endmiddot pop_back() to remove from the frontmiddot begin() returning an iterator to the first elementmiddot end() returning an iterator to just after the last elementmiddot size() for the number of elements

but only list has push_front() and pop_front()

Other useful containers set multiset map and multimap4958

Generic Programming and the STL

Traversal of containers can be achieved via iterators whichrequire suitable member functions begin() and end()

stdvectorltdoublegtconst_iterator sifor (si=sbegin() si = send() si++)

stdcout ltlt si ltlt stdendl

5058

Generic Programming and the STL

Another key STL part are algorithms

double sum = accumulate(sbegin() send() 0)

Some other STL algorithms are

middot find finds the first element equal to the supplied valuemiddot count counts the number of matching elementsmiddot transform applies a supplied function to each elementmiddot for_each sweeps over all elements does not altermiddot inner_product inner product of two vectors

5158

Template Programming

Template programming provides a lsquolanguage within C++rsquocode gets evaluated during compilation

One of the simplest template examples is

template lttypename Tgtconst Tamp min(const Tamp x const Tamp y)

return y lt x y x

This can now be used to compute the minimum between twoint variables or double or in fact any admissible typeproviding an operatorlt() for less-than comparison

5258

Template Programming

Another template example is a class squaring its argument

template lttypename Tgtclass square public stdunary_functionltTTgt public

T operator()(T t) const return tt

which can be used along with STL algorithms

transform(xbegin() xend() square)

5358

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 46: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

C++ Basics A Better C

C++ is a Better C

middot control structures similar to what R offers for while ifswitch

middot functions are similar too but note the difference inpositional-only matching also same function name butdifferent arguments allowed in C++

middot pointers and memory management very different but lotsof issues people had with C can be avoided via STL (whichis something Rcpp promotes too)

middot sometimes still useful to know what a pointer is hellip

4658

Object-Oriented

This is a second key feature of C++ and it does it differentlyfrom S3 and S4

struct Date unsigned int yearunsigned int monthunsigned int day

struct Person char firstname[20]char lastname[20]struct Date birthdayunsigned long id

4758

Object-Oriented

Object-orientation in the C++ sense matches data with codeoperating on it

class Date private

unsigned int yearunsigned int monthunsigned int date

publicvoid setDate(int y int m int d)int getDay()int getMonth()int getYear()

4858

Generic Programming and the STL

The STL promotes generic programming

For example the sequence container types vector dequeand list all support

middot push_back() to insert at the endmiddot pop_back() to remove from the frontmiddot begin() returning an iterator to the first elementmiddot end() returning an iterator to just after the last elementmiddot size() for the number of elements

but only list has push_front() and pop_front()

Other useful containers set multiset map and multimap4958

Generic Programming and the STL

Traversal of containers can be achieved via iterators whichrequire suitable member functions begin() and end()

stdvectorltdoublegtconst_iterator sifor (si=sbegin() si = send() si++)

stdcout ltlt si ltlt stdendl

5058

Generic Programming and the STL

Another key STL part are algorithms

double sum = accumulate(sbegin() send() 0)

Some other STL algorithms are

middot find finds the first element equal to the supplied valuemiddot count counts the number of matching elementsmiddot transform applies a supplied function to each elementmiddot for_each sweeps over all elements does not altermiddot inner_product inner product of two vectors

5158

Template Programming

Template programming provides a lsquolanguage within C++rsquocode gets evaluated during compilation

One of the simplest template examples is

template lttypename Tgtconst Tamp min(const Tamp x const Tamp y)

return y lt x y x

This can now be used to compute the minimum between twoint variables or double or in fact any admissible typeproviding an operatorlt() for less-than comparison

5258

Template Programming

Another template example is a class squaring its argument

template lttypename Tgtclass square public stdunary_functionltTTgt public

T operator()(T t) const return tt

which can be used along with STL algorithms

transform(xbegin() xend() square)

5358

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 47: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

C++ is a Better C

middot control structures similar to what R offers for while ifswitch

middot functions are similar too but note the difference inpositional-only matching also same function name butdifferent arguments allowed in C++

middot pointers and memory management very different but lotsof issues people had with C can be avoided via STL (whichis something Rcpp promotes too)

middot sometimes still useful to know what a pointer is hellip

4658

Object-Oriented

This is a second key feature of C++ and it does it differentlyfrom S3 and S4

struct Date unsigned int yearunsigned int monthunsigned int day

struct Person char firstname[20]char lastname[20]struct Date birthdayunsigned long id

4758

Object-Oriented

Object-orientation in the C++ sense matches data with codeoperating on it

class Date private

unsigned int yearunsigned int monthunsigned int date

publicvoid setDate(int y int m int d)int getDay()int getMonth()int getYear()

4858

Generic Programming and the STL

The STL promotes generic programming

For example the sequence container types vector dequeand list all support

middot push_back() to insert at the endmiddot pop_back() to remove from the frontmiddot begin() returning an iterator to the first elementmiddot end() returning an iterator to just after the last elementmiddot size() for the number of elements

but only list has push_front() and pop_front()

Other useful containers set multiset map and multimap4958

Generic Programming and the STL

Traversal of containers can be achieved via iterators whichrequire suitable member functions begin() and end()

stdvectorltdoublegtconst_iterator sifor (si=sbegin() si = send() si++)

stdcout ltlt si ltlt stdendl

5058

Generic Programming and the STL

Another key STL part are algorithms

double sum = accumulate(sbegin() send() 0)

Some other STL algorithms are

middot find finds the first element equal to the supplied valuemiddot count counts the number of matching elementsmiddot transform applies a supplied function to each elementmiddot for_each sweeps over all elements does not altermiddot inner_product inner product of two vectors

5158

Template Programming

Template programming provides a lsquolanguage within C++rsquocode gets evaluated during compilation

One of the simplest template examples is

template lttypename Tgtconst Tamp min(const Tamp x const Tamp y)

return y lt x y x

This can now be used to compute the minimum between twoint variables or double or in fact any admissible typeproviding an operatorlt() for less-than comparison

5258

Template Programming

Another template example is a class squaring its argument

template lttypename Tgtclass square public stdunary_functionltTTgt public

T operator()(T t) const return tt

which can be used along with STL algorithms

transform(xbegin() xend() square)

5358

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 48: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

Object-Oriented

This is a second key feature of C++ and it does it differentlyfrom S3 and S4

struct Date unsigned int yearunsigned int monthunsigned int day

struct Person char firstname[20]char lastname[20]struct Date birthdayunsigned long id

4758

Object-Oriented

Object-orientation in the C++ sense matches data with codeoperating on it

class Date private

unsigned int yearunsigned int monthunsigned int date

publicvoid setDate(int y int m int d)int getDay()int getMonth()int getYear()

4858

Generic Programming and the STL

The STL promotes generic programming

For example the sequence container types vector dequeand list all support

middot push_back() to insert at the endmiddot pop_back() to remove from the frontmiddot begin() returning an iterator to the first elementmiddot end() returning an iterator to just after the last elementmiddot size() for the number of elements

but only list has push_front() and pop_front()

Other useful containers set multiset map and multimap4958

Generic Programming and the STL

Traversal of containers can be achieved via iterators whichrequire suitable member functions begin() and end()

stdvectorltdoublegtconst_iterator sifor (si=sbegin() si = send() si++)

stdcout ltlt si ltlt stdendl

5058

Generic Programming and the STL

Another key STL part are algorithms

double sum = accumulate(sbegin() send() 0)

Some other STL algorithms are

middot find finds the first element equal to the supplied valuemiddot count counts the number of matching elementsmiddot transform applies a supplied function to each elementmiddot for_each sweeps over all elements does not altermiddot inner_product inner product of two vectors

5158

Template Programming

Template programming provides a lsquolanguage within C++rsquocode gets evaluated during compilation

One of the simplest template examples is

template lttypename Tgtconst Tamp min(const Tamp x const Tamp y)

return y lt x y x

This can now be used to compute the minimum between twoint variables or double or in fact any admissible typeproviding an operatorlt() for less-than comparison

5258

Template Programming

Another template example is a class squaring its argument

template lttypename Tgtclass square public stdunary_functionltTTgt public

T operator()(T t) const return tt

which can be used along with STL algorithms

transform(xbegin() xend() square)

5358

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 49: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

Object-Oriented

Object-orientation in the C++ sense matches data with codeoperating on it

class Date private

unsigned int yearunsigned int monthunsigned int date

publicvoid setDate(int y int m int d)int getDay()int getMonth()int getYear()

4858

Generic Programming and the STL

The STL promotes generic programming

For example the sequence container types vector dequeand list all support

middot push_back() to insert at the endmiddot pop_back() to remove from the frontmiddot begin() returning an iterator to the first elementmiddot end() returning an iterator to just after the last elementmiddot size() for the number of elements

but only list has push_front() and pop_front()

Other useful containers set multiset map and multimap4958

Generic Programming and the STL

Traversal of containers can be achieved via iterators whichrequire suitable member functions begin() and end()

stdvectorltdoublegtconst_iterator sifor (si=sbegin() si = send() si++)

stdcout ltlt si ltlt stdendl

5058

Generic Programming and the STL

Another key STL part are algorithms

double sum = accumulate(sbegin() send() 0)

Some other STL algorithms are

middot find finds the first element equal to the supplied valuemiddot count counts the number of matching elementsmiddot transform applies a supplied function to each elementmiddot for_each sweeps over all elements does not altermiddot inner_product inner product of two vectors

5158

Template Programming

Template programming provides a lsquolanguage within C++rsquocode gets evaluated during compilation

One of the simplest template examples is

template lttypename Tgtconst Tamp min(const Tamp x const Tamp y)

return y lt x y x

This can now be used to compute the minimum between twoint variables or double or in fact any admissible typeproviding an operatorlt() for less-than comparison

5258

Template Programming

Another template example is a class squaring its argument

template lttypename Tgtclass square public stdunary_functionltTTgt public

T operator()(T t) const return tt

which can be used along with STL algorithms

transform(xbegin() xend() square)

5358

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 50: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

Generic Programming and the STL

The STL promotes generic programming

For example the sequence container types vector dequeand list all support

middot push_back() to insert at the endmiddot pop_back() to remove from the frontmiddot begin() returning an iterator to the first elementmiddot end() returning an iterator to just after the last elementmiddot size() for the number of elements

but only list has push_front() and pop_front()

Other useful containers set multiset map and multimap4958

Generic Programming and the STL

Traversal of containers can be achieved via iterators whichrequire suitable member functions begin() and end()

stdvectorltdoublegtconst_iterator sifor (si=sbegin() si = send() si++)

stdcout ltlt si ltlt stdendl

5058

Generic Programming and the STL

Another key STL part are algorithms

double sum = accumulate(sbegin() send() 0)

Some other STL algorithms are

middot find finds the first element equal to the supplied valuemiddot count counts the number of matching elementsmiddot transform applies a supplied function to each elementmiddot for_each sweeps over all elements does not altermiddot inner_product inner product of two vectors

5158

Template Programming

Template programming provides a lsquolanguage within C++rsquocode gets evaluated during compilation

One of the simplest template examples is

template lttypename Tgtconst Tamp min(const Tamp x const Tamp y)

return y lt x y x

This can now be used to compute the minimum between twoint variables or double or in fact any admissible typeproviding an operatorlt() for less-than comparison

5258

Template Programming

Another template example is a class squaring its argument

template lttypename Tgtclass square public stdunary_functionltTTgt public

T operator()(T t) const return tt

which can be used along with STL algorithms

transform(xbegin() xend() square)

5358

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 51: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

Generic Programming and the STL

Traversal of containers can be achieved via iterators whichrequire suitable member functions begin() and end()

stdvectorltdoublegtconst_iterator sifor (si=sbegin() si = send() si++)

stdcout ltlt si ltlt stdendl

5058

Generic Programming and the STL

Another key STL part are algorithms

double sum = accumulate(sbegin() send() 0)

Some other STL algorithms are

middot find finds the first element equal to the supplied valuemiddot count counts the number of matching elementsmiddot transform applies a supplied function to each elementmiddot for_each sweeps over all elements does not altermiddot inner_product inner product of two vectors

5158

Template Programming

Template programming provides a lsquolanguage within C++rsquocode gets evaluated during compilation

One of the simplest template examples is

template lttypename Tgtconst Tamp min(const Tamp x const Tamp y)

return y lt x y x

This can now be used to compute the minimum between twoint variables or double or in fact any admissible typeproviding an operatorlt() for less-than comparison

5258

Template Programming

Another template example is a class squaring its argument

template lttypename Tgtclass square public stdunary_functionltTTgt public

T operator()(T t) const return tt

which can be used along with STL algorithms

transform(xbegin() xend() square)

5358

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 52: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

Generic Programming and the STL

Another key STL part are algorithms

double sum = accumulate(sbegin() send() 0)

Some other STL algorithms are

middot find finds the first element equal to the supplied valuemiddot count counts the number of matching elementsmiddot transform applies a supplied function to each elementmiddot for_each sweeps over all elements does not altermiddot inner_product inner product of two vectors

5158

Template Programming

Template programming provides a lsquolanguage within C++rsquocode gets evaluated during compilation

One of the simplest template examples is

template lttypename Tgtconst Tamp min(const Tamp x const Tamp y)

return y lt x y x

This can now be used to compute the minimum between twoint variables or double or in fact any admissible typeproviding an operatorlt() for less-than comparison

5258

Template Programming

Another template example is a class squaring its argument

template lttypename Tgtclass square public stdunary_functionltTTgt public

T operator()(T t) const return tt

which can be used along with STL algorithms

transform(xbegin() xend() square)

5358

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 53: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

Template Programming

Template programming provides a lsquolanguage within C++rsquocode gets evaluated during compilation

One of the simplest template examples is

template lttypename Tgtconst Tamp min(const Tamp x const Tamp y)

return y lt x y x

This can now be used to compute the minimum between twoint variables or double or in fact any admissible typeproviding an operatorlt() for less-than comparison

5258

Template Programming

Another template example is a class squaring its argument

template lttypename Tgtclass square public stdunary_functionltTTgt public

T operator()(T t) const return tt

which can be used along with STL algorithms

transform(xbegin() xend() square)

5358

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 54: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

Template Programming

Another template example is a class squaring its argument

template lttypename Tgtclass square public stdunary_functionltTTgt public

T operator()(T t) const return tt

which can be used along with STL algorithms

transform(xbegin() xend() square)

5358

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 55: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

Further Reading

Books by Meyers are excellent

I also like the (free) C++ Annotations

C++ FAQ

Resources on StackOverflow such as

middot general info and its links egmiddot booklist

5458

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 56: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

Debugging

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 57: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

Debugging

Some tips

middot Generally painful old-school printf() still pervasivemiddot Debuggers go along with compilers gdb for gcc and g++lldb for the clang llvm family

middot Extra tools such as valgrind helpful for memory debuggingmiddot ldquoSanitizerrdquo (ASANUBSAN) in newer versions of g++ andclang++

5658

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 58: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

Best Practices

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices
Page 59: Higher-Performance R via C++ - Part 1: Introductiondirk.eddelbuettel.com/papers/rcpp_uzuerich_2015_part1_intro.pdf · · C++11 is a big deal giving us new language features · While

Best Practices

Version control highly recommended to become familiar withgit or svn

Editor in the long-run recommended to learn productivitytricks for one editor emacs vi eclipse RStudio hellip

5858

  • Overview
    • Why R
    • Why C++
    • Vision
    • Speed
      • What Next
      • C++ Basics
        • A Better C
          • Debugging
          • Best Practices