outline - uppsala universityuser.it.uu.se/~carln/hpc2015_carln1.pdf · • great influence on the...

Carl Nettelblad 2015-11-24

Outline

• Languages

• Cases:

– Printing lists

– Sorting lists

• The discussion will include:

– Templates vs. inheritance

Why is C a good language?

• Fast

• Nothing is hidden

• “Lingua franca”

– Runs everywhere

• For any type of program

• Any kind of distributed/parallel computing

– Can interact with anything

• Compiled and static typing

Why is C a bad language?

• Tedious

– Easily getting stuck in “how”, not “what”

• Long iteration times

– Rebuild after a simple bug

• Unsafe

– Bugs can be devastating

– For scientific codes:

• Complex bugs can be hidden a long time

Why is Python a good language?

• Flexible

– Different abstractions

• Concise

• Good libraries for scientific and non-scientific purposes

• Easy to use for interactive and quick prototyping

Why is Python a bad language?

• Slow

– Default version is interpreted

– Not scaling well with threading

• Flexibility can promote bad habits

– Hard to guarantee that all parts are consistently

used when changes are made

• (Indentation carrying semantic meaning)

What do we want?

• Flexible abstractions

• Good and predictable libraries

• High performance

• Easy interactivity

• Type safety

• This language could be C++!

– Or Python with a mix of C++

Python from Matlab

• Matlab R2014b (8.4) and later support immediate

Python integration

• py.module.function

• I.e. access the highly accurate summation function

fsum using py.math.fsum

• Can work straight away, flat vectors (not matrices)

automatically translated back and forth

• Any Python module built in this course might then be

accessed in Matlab

Python from the web

• IPython platform for interactive Python

• IPython Notebook, web-based interface to IPython

– Combine code, text, and figures

• Kind of like Mathematica

– Easily edit different code snippets

– Press Shift+Enter to (re)compute

C++ from the web

• Jupyter project

– IPython is separated into interactivity engine and

actual Python

cling and clang

• At CERN, the ROOT framework has existed for a long

– Special classes and an interpreter of a language

similar to C++

– With several oddities

– Interpreted language truly slow

• Effort to rebuild this into using “real” C++

– cling real-time compiler based on clang

– clang is the C++ compiler currently used by Apple

• g++/gcc has been the de-facto standard for (open-source)

C/C++ compilers for a long time

• gcc has an archaic codebase

– Historically not easy to easily tie into some services

– E.g. get a parse tree

– Or add new code on the fly to an ongoing compilation

process

• Other compilers are closed source

– And also tend to lack flexible APIs

• clang is modularized (the front-end to the separate LLVM

backend) and open-sourced

Other users of clang

• In addition to cling and the Apple compilers, clang is

found in e.g.

– The Nvidia CUDA device compiler is clang-based,

no matter what host compiler you have

– The IDE Ceemple, which tries to bundle a lot of C++

libraries with a separate compiling mode with very

short latency is based on clang

• Keep the compiler loaded with all headers

between reruns

Working on an array in C

void printIntArray(int* data, int size)

for (int i = 0; i < size; i++)

printf("%d\n", data[i]);

Why is this bad?

• Adapted to one specific type of data (int)

• Size is an explicit parameter

– If size is specified incorrectly, we will read invalid

• The function can easily change the data

• Data pointer can be invalid

What would this look like in

Python?

def printArray(array):

for i in array:

print i

What would this look like in C++?

void printIntVector(IntVector* vector)

for (int i = 0; i < vector->size(); i++)

printf("%d\n", vector->get(i));

What would this look like in C++?

void printIntVector(vector<int>::iterator begin, vector<int>::iterator end)

for (vector<int>::iterator i = begin; i != end; i++)

cout << *i << "\n";

The inheritance abstraction

• An iterator would be a common interface or base class

• This is the case in e.g. Java

• Subclasses inherit from this base class

– Performing iteration in a specific data structure

– Virtual methods for getting next element, current

element etc.

• (Runtime) polymorphism

How is the method call made?

• Each object has a table of method implementations

• The slot numbers are fixed at compilation

– Any call to an Iterator method will be “call the

method pointed to in the right slot in the vtable”

– This is an indirect jump

IntVectorIterator

next()

Iterator

next()

Indirect jumps

• A modern fast CPU is pipelined and out of order

– Multiple instructions “in flight” at once

– If instructions depend on each other, an out of order core starts

executing a later one

– Pipeline depth 20

• Out of order window of 224 in recent Intel CPU

– Hides waiting on memory

– Latency is the difference between real and theoretical

performance

ADD MOV CMP JNZ MOV

MOV CMP JNZ MOV …

CMP JNZ MOV … …

Branch prediction

• Out of order works fine if the instruction stream is

• If you have a loop or an if statement, the CPU has to

– Can actually get pretty good

• A virtual method call is another branch

– In the very worst case, that instruction is not even

cached

Virtual methods in the compiler

• When you call a function directly in C, the compiler can

see everything that happens

– It can inline the function

– Move instructions around

– Do all the optimizations that make a modern

compiler fast, across the function call

• The virtual method call breaks this

– Sometimes the compiler can identify that the same

implementation is always used

The duck-typing abstraction

• Python uses the concept of duck typing

– “If it walks like a duck, swims like a duck, quacks like

a duck, it is a duck”

– “If an object has all the methods of an iterator, it is

an iterator”

• Convenient, flexible

– You can use inheritance, but you don’t rely on it to

define the contract

• Functions are looked up by name in a data structure

when they are called

– C++ vtables suddenly seem superfast

C++ templates

• Create functions and classes that can work on arbitrary

classes

• Simple motivation

– Type-safe container classes

• vector<int>

• map<int, double>

• These are done at compile-time

• Compiler error messages can be hard to track

– Templates within templates within templates

– Compare this to sudden error at runtime

Printing a list

template<typename T>

void printList(T begin, T end)

for (T i = begin; i != end; i++)

printf("%d\n", static_cast<int>(*i));

What happened here

• We are doing duck-typing in C++

• We don’t know what T is

– But begin and end are of the same type

– We can get a value with the dereference (*) operator

– That value can be casted to an int

– We can iterate to the next value with ++

• All of this is done at compile time

– Performance

– Correctness

Abstraction costs

• For a simple array, this is just as fast as the C version

– That code could only handle pointer-based int arrays

– But it can be binary trees (set), or a network stream

• For performance, you want to keep runtime costs of the

generalizations and abstractions you make at a

minimum

Printing a list

void printList(T begin, T end)

for (auto i = begin; i != end; i++)

printf("%d\n", static_cast<int>(*i));

Printing a list

void printList(const T& list)

for (auto i : list)

printf("%d\n", static_cast<int>(i));

Printing a list

void printList(const T& list)

for (auto i : list)

cout << i << "\n";

Consequences

• auto keyword

– For local variables, you frequently don’t really care about the type, no

“contract”

– Full typename could change if you change data structures later on

– Just let the compiler figure it out

• const &

– C and C++ send all paramters by value by default

– If you would send a full vector to a function, that could imply copying

the vector

– const means “I don’t want to be able to change this object by

accident”

– & means “I want to work on the original object, not a copy”

– These are semantic differences

Consequences

• for (auto i : list)

– Simple “for each” notation

– Under the hood relying on iterators

– But you can do stuff like

for (auto x : map<int,int>{{1,2}, {3,5}}) {

printf("%d %d\n", x.first, x.second);

• You simply can’t accidentally go outside the range with this syntax

Give your code a Boost

• The C++ standard library is rather thin

– It’s become larger in the last few standards

– You want to interact with the underlying tech (the

OS), not a library faking the OS

– OS libraries are rarely nice C++…

• Also lack of general algorithms and abstractions

• The Boost library (or library of libraries) changes this

• Independent project

– Started out in the end of last millennium

– Libraries added after peer review process, focusing on

generality and “nice interface”

– Varying quality

• Far fewer, but far more stable than arbitrary Perl, Python, or

R libraries

• Great influence on the C++ standards process

– The TR1 document between C++03 and C++11 based several

new libraries on their boost counterparts

– C++11 continued this

– Added language features in C++11 based on “things Boost

could not achieve”

What do we have in Boost?

• Accumulators, Algorithm, Align, Any, Array, Asio, Assert, Assign, Atomic, Bimap,

Bind, Call Traits, Chrono, Circular Buffer, Compatibility, Compressed Pair,

Concept Check, Config, Container, Context, Conversion, Convert, Core, Coroutine,

Coroutine2, CRC, Date Time, Dynamic Bitset, Enable If, Endian, Exception,

Filesystem, Flyweight, Foreach, Format, Function, Function Types, Functional,

Fusion, Geometry, GIL, Graph, Heap, ICL, Identity Type, In Place Factory,

Integer, Interprocess, Interval, Intrusive, IO State Savers, Iostreams, Iterator,

Lambda, Lexical Cast, Local Function, Locale, Lockfree, Log, Math, Member

Function, Meta State Machine, Min-Max, MPI, MPL, Multi-Array, Multi-Index,

Multiprecision, Numeric Conversion, Odeint, Operators, Optional, Parameter,

Phoenix, Pointer Container, Polygon, Pool, Predef, Preprocessor, Program

Options, Property Map, Property Tree, Random, Range, Ratio, Rational, Ref,

Regex, Result Of, Scope Exit, Serialization, Signals, Signals2, Smart Ptr, Sort,

Spirit, Statechart, Static Assert, String Algo, Swap, System, Test, Thread,

ThrowException, Timer, Tokenizer, TR1, Tribool, TTI, Tuple, Type Index, Type

Traits, Typeof, uBLAS, Units, Unordered, Utility, Uuiod, Value Initialized, Variant,

Wave, Xpressive

Python and C++

• When you integrate languages with each other, you

need to define:

– Who are you?

– Who are your users?

– Which language is extending the bridge into the

other?

– What features of the two languages need to be

maintained in the bridge?

– Do you have performance concerns?

Cython

• There are many ways to create bindings between

Python and other languages

• Cython generates C++ code from Python code

– Can call into C++ with some work

– The Python parser needs to understand C++

declarations

– The generated C++ code also needs to compile

correctly

• Do not confuse Cython with CPython (normal Python

implementation)

Performance of Cython

• Code can be annotated with exact types

– Allows more optimizations

– Tight loops can be quick

• Still plagued of some of the indirection problems of

Python

– Just as fast as C code interacting closely with

Python

– Not as fast as code in C/C++ with full control over

data structures

– Transition between C and Cython code is very quick

Cython C++ wrapping

class Rectangle {

public:

int x0, y0, x1, y1;

Rectangle(int x0, int y0, int x1, int y1);

~Rectangle();

int getLength();

int getHeight();

int getArea();

void move(int dx, int dy);

Wrapping to Cython

cdef extern from "Rectangle.h":

cdef cppclass Rectangle:

Rectangle(int, int, int, int) except +

int x0, y0, x1, y1

int getLength()

int getHeight()

int getArea()

void move(int, int)

Wrapping to Python

cdef class PyRectangle:

cdef Rectangle *thisptr # hold a C++ instance which we're wrapping

def __cinit__(self, int x0, int y0, int x1, int y1):

self.thisptr = new Rectangle(x0, y0, x1, y1)

def __dealloc__(self):

del self.thisptr

def getLength(self):

return self.thisptr.getLength()

def getHeight(self):

return self.thisptr.getHeight()

def getArea(self):

return self.thisptr.getArea()

def move(self, dx, dy):

self.thisptr.move(dx, dy)

Conclusion

• Interface stated three times

• One time in C++, two times in semi-Python

• Makes perfect sense if you are a Python coder

wrapping an existing C++ library

• Performance nice overall

• Wrapping is imperative in style

Boost.Python

• Far older interface (dating back to 2002!)

• Write C++ classes

• Define in C++ how these classes are mapped

Rectangle example again

BOOST_PYTHON_MODULE(shapes)

class_<Rectangle>("PyRectangle", init<int,int,int,int>())

.def("getLength", &Rectangle::getLength)

.def("getHeight", &Rectangle::getHeight)

.def("getArea", &Rectangle::getArea)

.def("move", &Rectangle::move)

Exposing data members

• .def_readonly("x0", &Rectangle::x0)

• More relevant, exposing existing getter with property

syntax of Python

.add_property("area", &Rectangle::getArea)

• Add a third parameter to have a setter as well

outline - uppsala universityuser.it.uu.se/~carln/hpc2015_carln1.pdf · • great influence on the...

Documents

shale gas greenhouse gas footprint...

ffi il: it# ffi $,'$ tr1 i, il*'d

tr1 troffer specifier series -...

lonworks tr1 series vfd

a proposal to add lambda functions to the c++ … · a...

kenyatta university dsvol-tutorial … · aht200 ac001...

dm(ple)917b, tr1: seminar iii community shaping leadership

1wcdma ran14 radio net fea alg tr1 idle mode

ref: lcaa7886 £575,000 barrack lane, truro, cornwall, tr1...

tr1 benyamini etal butterflies

model tr1 tru-trac - valinonline.com · 3,5. od. open...

chapter 00. c++ 배경 · 2018-11-25 · c++...

improving usability and performance of tr1 smart pointers

what have we learnt from ieaghg co2 capture and ccs ......

tr1-t34-88a - montero

formation c++ ubisoft - module 6+_module5b.pdf ·...

manual: winter parking lot and sidewalk maintenance...

omtp tr1 tr1

wiltonwilton, 2 wayfarer road, truro, tr1€3gg stags estate...

aspects of cxxr internals › projects › cxxr › pubs ›...