outline - uppsala universityuser.it.uu.se/~carln/hpc2015_carln1.pdf · • great influence on the...

73
C/C++ Carl Nettelblad 2015-11-24

Upload: others

Post on 19-Apr-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

C/C++

Carl Nettelblad 2015-11-24

Page 2: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

Outline

• Languages

• Cases:

– Printing lists

– Sorting lists

• The discussion will include:

– Templates vs. inheritance

Page 3: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

Why is C a good language?

Page 4: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

Why is C a good language?

• Fast

• Nothing is hidden

• “Lingua franca”

– Runs everywhere

• For any type of program

• Any kind of distributed/parallel computing

– Can interact with anything

• Compiled and static typing

Page 5: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

Why is C a bad language?

• Tedious

– Easily getting stuck in “how”, not “what”

• Long iteration times

– Rebuild after a simple bug

• Unsafe

– Bugs can be devastating

– For scientific codes:

• Complex bugs can be hidden a long time

Page 6: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

Why is Python a good language?

Page 7: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

Why is Python a good language?

• Flexible

– Different abstractions

• Concise

• Good libraries for scientific and non-scientific purposes

• Easy to use for interactive and quick prototyping

Page 8: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

Why is Python a bad language?

Page 9: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

Why is Python a bad language?

• Slow

– Default version is interpreted

– Not scaling well with threading

• Flexibility can promote bad habits

– Hard to guarantee that all parts are consistently

used when changes are made

• (Indentation carrying semantic meaning)

Page 10: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

What do we want?

• Flexible abstractions

• Good and predictable libraries

• High performance

• Easy interactivity

• Type safety

• This language could be C++!

– Or Python with a mix of C++

Page 11: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

Python from Matlab

• Matlab R2014b (8.4) and later support immediate

Python integration

• py.module.function

• I.e. access the highly accurate summation function

fsum using py.math.fsum

• Can work straight away, flat vectors (not matrices)

automatically translated back and forth

• Any Python module built in this course might then be

accessed in Matlab

Page 12: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

Python from the web

• IPython platform for interactive Python

• IPython Notebook, web-based interface to IPython

– Combine code, text, and figures

• Kind of like Mathematica

– Easily edit different code snippets

– Press Shift+Enter to (re)compute

Page 13: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

C++ from the web

• Jupyter project

– IPython is separated into interactivity engine and

actual Python

Page 14: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

cling and clang

• At CERN, the ROOT framework has existed for a long

time

– Special classes and an interpreter of a language

similar to C++

– With several oddities

– Interpreted language truly slow

• Effort to rebuild this into using “real” C++

– cling real-time compiler based on clang

– clang is the C++ compiler currently used by Apple

Page 15: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

clang

• g++/gcc has been the de-facto standard for (open-source)

C/C++ compilers for a long time

• gcc has an archaic codebase

– Historically not easy to easily tie into some services

– E.g. get a parse tree

– Or add new code on the fly to an ongoing compilation

process

• Other compilers are closed source

– And also tend to lack flexible APIs

• clang is modularized (the front-end to the separate LLVM

backend) and open-sourced

Page 16: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

Other users of clang

• In addition to cling and the Apple compilers, clang is

found in e.g.

– The Nvidia CUDA device compiler is clang-based,

no matter what host compiler you have

– The IDE Ceemple, which tries to bundle a lot of C++

libraries with a separate compiling mode with very

short latency is based on clang

• Keep the compiler loaded with all headers

between reruns

Page 17: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

Working on an array in C

void printIntArray(int* data, int size)

{

for (int i = 0; i < size; i++)

{

printf("%d\n", data[i]);

}

}

Page 18: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

Why is this bad?

• Adapted to one specific type of data (int)

• Size is an explicit parameter

– If size is specified incorrectly, we will read invalid

data

• The function can easily change the data

• Data pointer can be invalid

Page 19: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

What would this look like in

Python?

def printArray(array):

for i in array:

print i

Page 20: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

What would this look like in C++?

void printIntVector(IntVector* vector)

{

for (int i = 0; i < vector->size(); i++)

{

printf("%d\n", vector->get(i));

}

}

Page 21: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

What would this look like in C++?

void printIntVector(vector<int>::iterator begin, vector<int>::iterator end)

{

for (vector<int>::iterator i = begin; i != end; i++)

{

cout << *i << "\n";

}

}

Page 22: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

The inheritance abstraction

• An iterator would be a common interface or base class

• This is the case in e.g. Java

• Subclasses inherit from this base class

– Performing iteration in a specific data structure

– Virtual methods for getting next element, current

element etc.

• (Runtime) polymorphism

Page 23: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

How is the method call made?

• Each object has a table of method implementations

• The slot numbers are fixed at compilation

– Any call to an Iterator method will be “call the

method pointed to in the right slot in the vtable”

– This is an indirect jump

IntVectorIterator

next()

get()

Iterator

next()

get()

Page 24: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

Indirect jumps

• A modern fast CPU is pipelined and out of order

– Multiple instructions “in flight” at once

– If instructions depend on each other, an out of order core starts

executing a later one

– Pipeline depth 20

• Out of order window of 224 in recent Intel CPU

– Hides waiting on memory

– Latency is the difference between real and theoretical

performance

ADD MOV CMP JNZ MOV

MOV CMP JNZ MOV …

CMP JNZ MOV … …

Page 25: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

Branch prediction

• Out of order works fine if the instruction stream is

known

• If you have a loop or an if statement, the CPU has to

guess

– Can actually get pretty good

• A virtual method call is another branch

– In the very worst case, that instruction is not even

cached

Page 26: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

Virtual methods in the compiler

• When you call a function directly in C, the compiler can

see everything that happens

– It can inline the function

– Move instructions around

– Do all the optimizations that make a modern

compiler fast, across the function call

• The virtual method call breaks this

– Sometimes the compiler can identify that the same

implementation is always used

Page 27: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

The duck-typing abstraction

• Python uses the concept of duck typing

– “If it walks like a duck, swims like a duck, quacks like

a duck, it is a duck”

– “If an object has all the methods of an iterator, it is

an iterator”

• Convenient, flexible

– You can use inheritance, but you don’t rely on it to

define the contract

• Functions are looked up by name in a data structure

when they are called

– C++ vtables suddenly seem superfast

Page 28: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

C++ templates

• Create functions and classes that can work on arbitrary

classes

• Simple motivation

– Type-safe container classes

• vector<int>

• map<int, double>

• These are done at compile-time

• Compiler error messages can be hard to track

– Templates within templates within templates

– Compare this to sudden error at runtime

Page 29: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

Printing a list

template<typename T>

void printList(T begin, T end)

{

for (T i = begin; i != end; i++)

{

printf("%d\n", static_cast<int>(*i));

}

}

Page 30: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

What happened here

• We are doing duck-typing in C++

• We don’t know what T is

– But begin and end are of the same type

– We can get a value with the dereference (*) operator

– That value can be casted to an int

– We can iterate to the next value with ++

• All of this is done at compile time

– Performance

– Correctness

Page 31: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

Abstraction costs

• For a simple array, this is just as fast as the C version

– That code could only handle pointer-based int arrays

– But it can be binary trees (set), or a network stream

• For performance, you want to keep runtime costs of the

generalizations and abstractions you make at a

minimum

Page 32: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

Printing a list

template<typename T>

void printList(T begin, T end)

{

for (auto i = begin; i != end; i++)

{

printf("%d\n", static_cast<int>(*i));

}

}

Page 33: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

Printing a list

template<typename T>

void printList(const T& list)

{

for (auto i : list)

{

printf("%d\n", static_cast<int>(i));

}

}

Page 34: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

Printing a list

template<typename T>

void printList(const T& list)

{

for (auto i : list)

{

cout << i << "\n";

}

}

Page 35: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

Consequences

• auto keyword

– For local variables, you frequently don’t really care about the type, no

“contract”

– Full typename could change if you change data structures later on

– Just let the compiler figure it out

• const &

– C and C++ send all paramters by value by default

– If you would send a full vector to a function, that could imply copying

the vector

– const means “I don’t want to be able to change this object by

accident”

– & means “I want to work on the original object, not a copy”

– These are semantic differences

Page 36: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

Consequences

• for (auto i : list)

– Simple “for each” notation

– Under the hood relying on iterators

– But you can do stuff like

for (auto x : map<int,int>{{1,2}, {3,5}}) {

printf("%d %d\n", x.first, x.second);

}

• You simply can’t accidentally go outside the range with this syntax

Page 37: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

Give your code a Boost

• The C++ standard library is rather thin

– It’s become larger in the last few standards

– You want to interact with the underlying tech (the

OS), not a library faking the OS

– OS libraries are rarely nice C++…

• Also lack of general algorithms and abstractions

• The Boost library (or library of libraries) changes this

Page 38: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

Boost

• Independent project

– Started out in the end of last millennium

– Libraries added after peer review process, focusing on

generality and “nice interface”

– Varying quality

• Far fewer, but far more stable than arbitrary Perl, Python, or

R libraries

• Great influence on the C++ standards process

– The TR1 document between C++03 and C++11 based several

new libraries on their boost counterparts

– C++11 continued this

– Added language features in C++11 based on “things Boost

could not achieve”

Page 39: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

What do we have in Boost?

• Accumulators, Algorithm, Align, Any, Array, Asio, Assert, Assign, Atomic, Bimap,

Bind, Call Traits, Chrono, Circular Buffer, Compatibility, Compressed Pair,

Concept Check, Config, Container, Context, Conversion, Convert, Core, Coroutine,

Coroutine2, CRC, Date Time, Dynamic Bitset, Enable If, Endian, Exception,

Filesystem, Flyweight, Foreach, Format, Function, Function Types, Functional,

Fusion, Geometry, GIL, Graph, Heap, ICL, Identity Type, In Place Factory,

Integer, Interprocess, Interval, Intrusive, IO State Savers, Iostreams, Iterator,

Lambda, Lexical Cast, Local Function, Locale, Lockfree, Log, Math, Member

Function, Meta State Machine, Min-Max, MPI, MPL, Multi-Array, Multi-Index,

Multiprecision, Numeric Conversion, Odeint, Operators, Optional, Parameter,

Phoenix, Pointer Container, Polygon, Pool, Predef, Preprocessor, Program

Options, Property Map, Property Tree, Random, Range, Ratio, Rational, Ref,

Regex, Result Of, Scope Exit, Serialization, Signals, Signals2, Smart Ptr, Sort,

Spirit, Statechart, Static Assert, String Algo, Swap, System, Test, Thread,

ThrowException, Timer, Tokenizer, TR1, Tribool, TTI, Tuple, Type Index, Type

Traits, Typeof, uBLAS, Units, Unordered, Utility, Uuiod, Value Initialized, Variant,

Wave, Xpressive

Page 40: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

Python and C++

• When you integrate languages with each other, you

need to define:

– Who are you?

– Who are your users?

– Which language is extending the bridge into the

other?

– What features of the two languages need to be

maintained in the bridge?

– Do you have performance concerns?

Page 41: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

Cython

• There are many ways to create bindings between

Python and other languages

• Cython generates C++ code from Python code

– Can call into C++ with some work

– The Python parser needs to understand C++

declarations

– The generated C++ code also needs to compile

correctly

• Do not confuse Cython with CPython (normal Python

implementation)

Page 42: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

Performance of Cython

• Code can be annotated with exact types

– Allows more optimizations

– Tight loops can be quick

• Still plagued of some of the indirection problems of

Python

– Just as fast as C code interacting closely with

Python

– Not as fast as code in C/C++ with full control over

data structures

– Transition between C and Cython code is very quick

Page 43: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

Cython C++ wrapping

class Rectangle {

public:

int x0, y0, x1, y1;

Rectangle(int x0, int y0, int x1, int y1);

~Rectangle();

int getLength();

int getHeight();

int getArea();

void move(int dx, int dy);

};

Page 44: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

Wrapping to Cython

cdef extern from "Rectangle.h":

cdef cppclass Rectangle:

Rectangle(int, int, int, int) except +

int x0, y0, x1, y1

int getLength()

int getHeight()

int getArea()

void move(int, int)

Page 45: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

Wrapping to Python

cdef class PyRectangle:

cdef Rectangle *thisptr # hold a C++ instance which we're wrapping

def __cinit__(self, int x0, int y0, int x1, int y1):

self.thisptr = new Rectangle(x0, y0, x1, y1)

def __dealloc__(self):

del self.thisptr

def getLength(self):

return self.thisptr.getLength()

def getHeight(self):

return self.thisptr.getHeight()

def getArea(self):

return self.thisptr.getArea()

def move(self, dx, dy):

self.thisptr.move(dx, dy)

Page 46: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

Conclusion

• Interface stated three times

• One time in C++, two times in semi-Python

• Makes perfect sense if you are a Python coder

wrapping an existing C++ library

• Performance nice overall

• Wrapping is imperative in style

Page 47: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

Boost.Python

• Far older interface (dating back to 2002!)

• Write C++ classes

• Define in C++ how these classes are mapped

Page 48: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

Rectangle example again

BOOST_PYTHON_MODULE(shapes)

{

class_<Rectangle>("PyRectangle", init<int,int,int,int>())

.def("getLength", &Rectangle::getLength)

.def("getHeight", &Rectangle::getHeight)

.def("getArea", &Rectangle::getArea)

.def("move", &Rectangle::move)

;

}

Page 49: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

Exposing data members

• .def_readonly("x0", &Rectangle::x0)

• More relevant, exposing existing getter with property

syntax of Python

.add_property("area", &Rectangle::getArea)

• Add a third parameter to have a setter as well

Page 50: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

More complex stories

• Define customized rules for how to map Python types

to C++ types

• Define declarative ownership rules for objects created

in C++

– That’s what the PyRectangle Cython wrapper did in

code

Page 51: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

Sorting data

• Common task

• Quicker to keep data non-sorted and sort it later vs.

maintaining propery sorted data structure

– I.e. keep a vector, then sort it, rather than keeping a

C++ set (which is a sorted self-balancing tree)

Page 52: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

What is needed to sort

• Sorting itself is O(n log n)

– Using a proper algorithm, you can always sort a list

in a number of operations that is proportional to

n log n basic operations

– Since proportional, it doesn’t matter what log base

we are using

– If sorting 1,000 elements would use T operations, we

would expect 1,000,000 elements to use 2000T (not

1000T)

Page 53: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

or

Which data layout would you use for sorting? Which data

layout would you use for accessing the data later?

Indirection

El1 El2 El3 El4

Ref1 Ref2 Ref3 Ref4

El1 El3 El4 El2

Page 54: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

Indirection

• Just keeping references could seem making sorting

faster

– You don’t need to move the full elements

– This is kind of true

– Depends a bit on how much of the data you need to

access to do the sorting comparisons

• Overall size larger in indirect case

– Frequently overhead for each allocated element

• Remember: Current CPUs are very fast

– When they can predict what to do beforehand

– Moving a chunk of data is predictable

Page 55: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

Indirection

• If sorting for speedy access later, “sorting” an indirected data

structure will keep actual data stored all over the place

• When is indirection used?

– Python lists, Python dictionaries

– Java ArrayList, HashMap etc

– Java arrays of non-primitive types

– General pointer-based data structures in C and C++

– Cell arrays in Matlab

• When is it not used?

– array module in Python

– numpy matrices, Matlab matrices

– C/C++ arrays, and some C++ STL containers (vector, array)

Page 56: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

Case in point

• Python list of integers

– Each value is really just 4 bytes

– An entry in a list will use 8 bytes on a 64-bit machine

– The minimum size of the allocated list element is 24

bytes

• Sorting will require walking over the 8-byte entries,

tracing each to the correct element, and then moving

the entries around

Page 57: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

Caches

• All memory is not equally fast

• Different levels of caches

• If data does not fit into cache, things become slow

• CPU does prefetching

– If memory accesses follow simple patterns

– Random indirection does NOT

• Cache-friendly code

– Good locality

• Keep using the same part of memory before moving on

– Small workset

• Keep memory usage low

– Good predictability

• Helps prefetcher

Page 58: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

So, how do we sort?

• Do not implement a general sorting algorithm yourself

• In C:

– void qsort(void *base, size_t nitems, size_t size, int (*compar)(const void *, const void*))

• You have bare pointers to data, you need to state the size of each

element, you need a pointer to a function that can do comparisons

• We learned in the printing example that we can do better…

• template <class RandomAccessIterator> void sort (RandomAccessIterator first, RandomAccessIterator last);

• template <class RandomAccessIterator, class Compare> void sort (RandomAccessIterator first, RandomAccessIterator last, Compare comp);

Page 59: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

Simple and complex sorting

• Elements with valid < operator can be sorted in increasing order by simply

– sort(vector.begin(), vector.end());

• Nice trick

– pair class, make tuples of your data where the desired order is

represented by the first element

• Harder case, implement a function which takes const (references) to the

elements

– Returning true if the first object comes before the latter

– false if it comes after or if they are equivalent (strict-weak ordering)

– The bugs you can get in any language for invalid comparison code are

nasty

• The pair suggestion might not be too bad

Page 60: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

Functors

• A function is a very real thing in original C

– It’s a single specific piece of code (location in

memory) than can be called by the CPU

• In modern C++

– We have templates etc

• Same piece of source can result in multiple sets

of machine code

– Inlining might mean that there is no function call, not

even a block of machine instructions for the function

• A function is just code, no data

Page 61: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

Functors

• C++ supports operator overloading

• () is another operator

• So, comparing elements can be as simple as struct intComparator

{

bool operator() (const int left, const int right)

{

return left < right;

}

};

sort(data.begin(), data.end(), intComparator());

Page 62: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

Why would you want a functor?

• Auxiliary data

• Settings

• Caching/precalculation to speed up additional function calls

• Keeping statistics

• Any case where you want to inject a piece of code inside an

algorithm or library

Page 63: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

Sorting again

struct intComparator

{

int comps;

intComparator() : comps(0) {}

bool operator() (const int left, const int right)

{

return left < right;

}

};

intComparator comparer;

sort(data.begin(), data.end(), comparer);

printf("%d\n", comparer.comps);

Page 64: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

Functors

• This was nice

• But it moves the logic for the sorting away from the

place where we sort

– Logical if is a general ordering

– But if it’s a general ordering, it should probably just

be in the < operator for the elements we sort

Page 65: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

Lambda expressions

• You would really like to put the instructions right where

they logically belong

• Like…

Page 66: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

This?

int comps = 0;

sort(data.begin(), data.end(), [&] (int left, int right) {

comps++;

return left < right;

} );

printf("%d\n", comps);

• And in the C++14 standard, you can even put ”auto” for left and right there

Page 67: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

What was that?

• Lambda expressions

– Common in functional programming

– Common in Python

• Create a functor object in place anywhere

– Put all relevant code in one place

• Local variables can be made accessible within the

functor

– “Capturing”

Page 68: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

Lambda syntax

• [capture-list] (params) mutable -> return-type { body }

• Specifying mutable is optional

• Return type and arrow does not need to be specified, if

it can be deduced correctly

– Single return statement with evident type

Page 69: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

Capture lists

• If you just specify [], the lambda won’t have access to

local variables

• You can also specify a list of variable names [a,b,c]

– These are then captured by value, a copy is made

– That makes them safe to access even when the

function that created the lambda has returned

• You can also specify variables with & - [&a,&b,&c]

• Shorthand, capture all variables by value [=], all

variables by reference [&]

• If you have a method in a class, you can also expose

instance variables using [this]

Page 70: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

Performance comparison

• Sorted the same numbers, the same way

– Generate 10,000,000 random numbers

– Sort them

– With functor

• With and without counting comparisons

• With and without inlining

– With lambda

• With and without counting comparisons

• Amounting to 282306119 comparisons

• On GCC 5.2, Tintin

• Repeated timings a bit, not fully accurate

Page 71: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

Performance comparison

Type Inline Counting Time (s)

Functor X 1.114

Lambda X 1.119

Functor 1.522

Functor X X 1.102

Lambda X X 1.205

Functor X 1.523

Page 72: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

Performance comparison

• Functor turns out to be faster than lambda for counting case

– Probably variable capture by reference leads to two levels of

indirection (get to the lambda data, get the reference to the

comps variable, update it)

– Lambda still less optimized than normal objects

• Some uncertainty in the numbers

• Relatively huge overhead in not inlining

– Even a non-indirect function call is expensive if the operation is

simple

• Benefits of inlining can sometimes be even stronger for slightly

longer methods

– More data interactions between caller and callee that the

optimizer can work on

Page 73: Outline - Uppsala Universityuser.it.uu.se/~carln/HPC2015_carln1.pdf · • Great influence on the C++ standards process –The TR1 document between C++03 and C++11 based several new

Summary

• Lambda, auto, range for, and templates are some

examples of things that make modern C++ a much

more pleasant language to use

• The way these technologies are implemented allow

them to give the same or better performance than

equivalent C code

– Much faster than Python code

– While actual code can be similar in style

• Powerful libraries in Boost

• Interactive modern C++ is there, but not as mature as

IPython