c++ advanced

110
page 1 Advanced C++; Gyuszi Suto, Intel 06/11/22 C++0X Standard C++ Standard Library Extensions Technical Report 1 (TR1) Advanced C++ Runtime Improvement Techniques Gyuszi Suto November 2009

Upload: vivek-das

Post on 10-May-2015

3.331 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: C++ Advanced

page 1Advanced C++; Gyuszi Suto, Intel 04/11/23

C++0X StandardC++ Standard Library Extensions

Technical Report 1 (TR1)

Advanced C++Runtime Improvement Techniques

Gyuszi Suto

November 2009

Page 2: C++ Advanced

page 2Advanced C++; Gyuszi Suto, Intel 04/11/23

References

• The C++ Standard. 2003 Wiley & Sons, Ltd.• C++ Templates: The Complete Guide

by David Vandevoorde, Nicolai M. Josuttis • Modern C++ Design: Generic Programming and Design Patterns Applied

by Andrei Alexandrescu• C++ Template Metaprogramming : Concepts, Tools, and Techniques from Boost and Beyond (C++ in Depth

Series)by David Abrahams, Aleksey Gurtovoy

• Bjarne Stroustrup. 1997. The C++ Programming Language. 3rd Edition.• Pete Becker 2007. The C++ Standard Library Extensions. A Tutorial and Reference. – the course

material is based in good part on this book

Page 3: C++ Advanced

page 3Advanced C++; Gyuszi Suto, Intel 04/11/23

What’s in C++0xCore language runtime performance enhancements• rvalue reference and move semantics• generalized contstant expressions• etc.

Core language usability enhancements• Type inference (decltype, auto)• Lambda functions and expressions• Initializer list• Angle bracket• Template typedefs• Range-based for loops• etc.

Core language functionality improvements• Variadic templates• Static assertions• Type long long int• New string literals• Multitasking memory model• etc.

C++ standard library changes• Array• Tuple types• Hash tables• forward_list• Smart pointers• Function objects• Bind• Regular expressions• Wrapper reference• etc.

/usr/intel/pkgs/icc/11.1.046e/bin/intel64/icc -std=c++0x filename.cc -Wl,-rpath,/usr/intel/pkgs/gcc/4.4.0/lib64/usr/intel/pkgs/gcc/4.4.0/bin/g++ -std=c++0x filename.cc

in namespace std::tr1::

Page 4: C++ Advanced

page 4Advanced C++; Gyuszi Suto, Intel 04/11/23

Course Content

Type inferenceLambdaTraitsTemplate MetaprogrammingVariadic Templatestuple, tieauto_ptr, shared_ptr, weak_ptrC-style array, memory mgmtarrayforward_listrvalue referenceSTL containers, custom comparatorsHash tables (aka. unordered associative containers) unordered_map, unordered_setmem_fnreference_wrapperfunction class templatebind function template

inliningcache performancebranching, if-less code

tuple implementation case study

Page 5: C++ Advanced

page 5Advanced C++; Gyuszi Suto, Intel 04/11/23

Type inference, decltype, auto

Similarly to the sizeof operator, the operand of decltype is unevaluated. Informally, the type returned by

decltype(e)

is deduced as follows:

1. If the expression e refers to a variable in local or namespace scope, a static member variable or a function parameter, then the result is that variable's or parameter's declared type

2. If e is a function call or an overloaded operator invocation, decltype(e) denotes the declared return type of that function

3. Otherwise, if e is an lvalue, decltype(e) is T&, where T is the type of e; if e is an rvalue, the result is T

const int&& foo(); int i; struct A { double x; }; const A* a = new A(); decltype(foo()) x1; // type is const int&& - rvalue referencedecltype(i) x2; // type is int decltype(a->x) x3; // type is double decltype((a->x)) x4; // type is const double&

Page 6: C++ Advanced

page 6Advanced C++; Gyuszi Suto, Intel 04/11/23

Type inference, decltype, auto

int main(){ const std::vector<int> v(3); auto a = v[0]; // a has type int

decltype(v[0]) b = 1; // b has type const int&, the return type of // std::vector<int>::operator[](size_type) const

auto c = 0; // c has type int auto d = c; // d has type int decltype(c) e; // e has type int, the type of the entity named by c decltype((c)) f = c; // f has type int&, because (c) is an lvalue

c = 3; f = 4; cout << "c " << c << " f " << f << endl; // prints c 4 f 4

decltype(0) g; // g has type int, because 0 is an rvalue

for(auto itr = v.begin(); itr != v.end(); ++itr){ cout << *itr; }

Page 7: C++ Advanced

page 7Advanced C++; Gyuszi Suto, Intel 04/11/23

Lambda

[] (int x) -> int { return x; } (10);

captureclauseempty – no local variables can be accessed= local variables accessed by value, not lvalue& local variables accessed by reference, lvalue

formal parametersto lambda function

specifies return type (not req’d if the entire function body is contained within one return statemet)

body of lambda function

actual argument(s)passed to lambda function

Page 8: C++ Advanced

page 8Advanced C++; Gyuszi Suto, Intel 04/11/23

Lambda

int main(void){

// lambda function, no access to local variables auto lambdaf3 = [] (int i) { return i+3; }; cout << lambdaf3(4) << "\n"; // prints 7

// accesses local variable by reference (and modifies it) int localv = 9; auto lambdaf4 = [&] (int i) { return localv = i; }; cout << lambdaf4(7) << ", " << localv << endl; // modifies localv, prints 7, 7

int localv2 = 11; auto lambdaf5 = [] (int i) { return i + localv2; }; // error, don't know what localv2 is cout << lambdaf5(4) << endl;

int localv3 = 13; auto lambdaf6 = [=] (int i) { return i + localv3; }; // passing local context by value cout << lambdaf6(5) << endl; // prints 18}

Page 9: C++ Advanced

page 9Advanced C++; Gyuszi Suto, Intel 04/11/23

Lambda

int main(void){

int localv3 = 13; auto lambdaf7 = [=] (int i) { return localv3 += i; }; // error, localv3 is not a modifiable lvalue

int a = 3, b = 4; auto lambdaf8 = [=, &b] (int i) { return 3 + (b += (a + i)); }; cout << lambdaf8(3) << ", " << b << endl; // accesses a by value, b by ref, prints 13, 10

array<int, 5> ia = { 9, -3, 2, 13, -7};

// defines and uses a lambda in the context of a standard sort std::sort(ia.begin(), ia.end(), [] (int a, int b) { return std::abs(a) < std::abs(b); } );

print_collection(ia); // prints 2 -3 -7 9 13

}

Page 10: C++ Advanced

page 10Advanced C++; Gyuszi Suto, Intel 04/11/23

Lambda

int pivot = 5; // global variable

int main(void){ array<int, 5> ia = { 9, -3, 2, 13, -7};

// defines a named lambda function, used later inside sort auto lambdaf1 = [] (int a, int b) { return std::abs(a) < std::abs(b); }; std::sort(ia.begin(), ia.end(), lambdaf1 );

print_collection(ia); // prints 2 -3 -7 9 13

// pivot global in this case std::sort(ia.begin(), ia.end(), [] (int a, int b) { return std::abs(a - pivot) < std::abs(b - pivot); } ); print_collection(ia); // prints 2 9 -3 13 -7

int pivot2 = 8; // pivot2 local variable pass by ref std::sort(ia.begin(), ia.end(), [pivot2] (int a, int b) { return std::abs(a - pivot2) < std::abs(b - pivot2); } ); print_collection(ia); // prints 9 13 2 -3 -7 return 0;}

Page 11: C++ Advanced

page 11Advanced C++; Gyuszi Suto, Intel 04/11/23

Lambdaclass X{public: int a, b; // class member data // constructor X(int aa = 0, int bb = 2) : a(aa), b(bb) {}

void memfun1 (void){ int c = 2, d = 3; // local variables // lambda function definition and call [&, d] (int k) -> void { cout << a << ", " << b << ", " << c << ", " << d << ", " << (c+=k) << endl; return; } (7); }

void memfun2(void) { // named lambda function, accesses this auto lambda12 = [this] (int k) { return this + k; }; cout << lambda12(0xFF) << endl;

// named lambda function, tries to access by reference, compiler warning ! auto lambda13 = [&this] (int k) -> X * { a += 3; return this + k; }; cout << lambda13(0xEE) << endl; }}

int main(void){ X x1; cout << "in X" << endl; x1.memfun1(); // prints 0, 2, 2, 3, 9

cout << x1.a << " before the call" << endl; // prints 0 x1.memfun2(); // prints 0x7fbfffcd00 cout << x1.a << " after the call" << endl; // prints 3}

Page 12: C++ Advanced

page 12Advanced C++; Gyuszi Suto, Intel 04/11/23

Lambdaauto lambdaf3 = [] (int i) { return i+3; };cout << lambdaf3(4) << "\n"; // prints 7

// examples of calling lambda from lambdaauto lambdaf13 = [] (int j) { return lambdaf3(j) + 4; }; // error, does not know about lambdaf3cout << lambdaf13(2) << endl; //

auto lambdaf13 = [&] (int j) { return lambdaf3(j) + 4; }; // okay, lambdaf3 known by referencecout << lambdaf13(2) << endl; // prints 9

auto lambdaf14 = [=] (int j) { return lambdaf3(j) + 4; }; // error, doesn't know how to pass // lambdaf3 by value

cout << lambdaf14(2) << endl;

// local variableint ii = 6;auto lambdaf15 = [&] (int k) -> int { int i15 = 5; auto lambdaf15_1 = [=] (int m) { return i15 + ii + m; }; // lambda inside lambda ii = k + 4; return lambdaf15_1(k + 2) + 3;};cout << "lambdaf15 " << lambdaf15(3) << " ii " << ii << endl; // prints lambdaf15 19 ii 7

array<int, 4> a14 = { 1, 2, 3, 4};

// print the factorial for each element of array a14for_each(a14.begin(), a14.end(), [] (int k) -> void { // unnamed lambda, will be call for each element of array a14 // recursive factorial function<int (int)> factorial = [&factorial] (int kk) { return kk == 0 ? 1 : factorial(kk-1) * kk; }; // ok

// error expression must have (pointer-to-) function type (??) auto factorial = [&factorial] (int kk) { return kk == 0 ? 1 : factorial(kk-1) * kk); };

cout << k << ": " << factorial(k) << "; "; }); // prints 1, 2, 6, 24

Page 13: C++ Advanced

page 13Advanced C++; Gyuszi Suto, Intel 04/11/23

Un-named lambda, named lambda and function

void foo(){ int i, j; // local variables … need a function here, called once no other place needs it make it unnamed lambda it will know the context i, j

… need a function here that knows the context i, j, and it will be called here and in other locations of this function make it a named lambda

… need to call a function here this function may be called from other functions and it does not know about the context of this function foo() so make it an outside function x() and call it here

}

// this function does no know about // the stack frame of the callee// if the caller needs to pass its context, then// it needs to send it in via function arguments void xx(){ int k, l, m; …}

Page 14: C++ Advanced

page 14Advanced C++; Gyuszi Suto, Intel 04/11/23

Template Traits

template <class T>class MMAX {public: enum {mmax = 10 };};

template<>class MMAX<int>{public: enum {mmax = 1000 };};

template <>class MMAX<float>{public: enum {mmax = 20000 };};

template <class V, class TRAIT = MMAX<V> >class Luba{public: V v[TRAIT::mmax];};

int main(void){ Luba<int, MMAX<int> > myLuba; Luba<float> myLuba2; return 0;}

template trait

template default type argument

relying on mmax enum of TRAIT

Type V and type TRAIT are decoupled

Page 15: C++ Advanced

page 15Advanced C++; Gyuszi Suto, Intel 04/11/23

Template Traits cont’dconst char * table[] = { “Unknown”, // 0 “Int”, // 1 “Float”, // 2 “X” // 3};

template <class T>class Trait {public: static const int index = 0;};

template<>class Trait<int> {public: static const int index = 1;};

template<>class Trait<X> {public: static const int index = 3;};

template<class T, class R = Trait<T> >class W{public: T val; static const char * name () { return table[R::index] ; }};…cout << W<int>::name(); // prints “Int”, relies on default argument of Trait<int>cout << W<X, Trait<X> >::name(); // prints “X”cout << W<char, Trait<char> >::name(); // prints “Unknown”cout << W<double, Trait<int> >::name(); // prints “Int” !!

Template specialization

Generic Template

Page 16: C++ Advanced

page 16Advanced C++; Gyuszi Suto, Intel 04/11/23

Template Metaprogramming

template<int N>class Factorial { public: enum { value = N * Factorial<N-1>::value }; };

class Factorial<1> { // template specialization for 1 public: enum { value = 1 }; };

…// evaluated at compile-timestd::cout << Factorial<3>::value ; // prints 6int a[Factorial<4>::value]; // 24

template <unsigned long N> struct binary { static unsigned const value = binary<N/10>::value * 2 // prepend higher bits + N%10; // to lowest bit };

template <> // specialization struct struct binary<0> // terminates recursion { static unsigned const value = 0; };

int b[binary<1101>::value]; // compile-time evaluation to 13

Done at compile - time

Page 17: C++ Advanced

page 17Advanced C++; Gyuszi Suto, Intel 04/11/23

Variadic Templates

#include <iostream>#include <typeinfo>

template<int size, int... ints> // packing intsclass X{public: void foo(void) { int x[size] = {ints...}; // unpacking ints for(int i = 0; i < size; ++i){ std::cout << x[i] << " "; } std::cout << "\n"; }};

int main(void){ X<3, 8, 9, 10> x1; x1.foo(); // prints 8 9 10

X<4, 11, 12, 13, 14> x2; x2.foo(); // prints 11 12 13 14

return 0;}

Page 18: C++ Advanced

page 18Advanced C++; Gyuszi Suto, Intel 04/11/23

Variadic Templatesvoid myprintf(const char * s) {} // no-op

template<typename T, typename... Args>void myprintf(const char* s, T value, Args... args) // args are packed{ while (*s) { if (*s == '%' && *(++s) != '%') { std::cout << value; // args are unpacked below myprintf(s, args...); // call even when *s == 0 to detect extra arguments return; } std::cout << *s++; }}#if 0%nm a.out | grep myprintf00000000004009cb t _GLOBAL__I__Z8myprintfPKc0000000000400b0c W _Z8myprintfIdIEEvPKcT_DpT0_0000000000400a7a W _Z8myprintfIfIdEEvPKcT_DpT0_00000000004009e0 W _Z8myprintfIiIfdEEvPKcT_DpT0_00000000004008ec T _Z8myprintfPKc #endif

int main(void){ int i = 5; float f = 6.6; double d = 9.9;

myprintf("i=% f=% d=%", i, f, d); // prints i=5 f=6.6 d=9.9

myprintf("i=% f=%", i, f, d); // prints i=5 f=6.6

myprintf("i=% f=% d=% d=%", i, f, d); // prints i=5 f=6.6 d=9.9

return 0;}

Page 19: C++ Advanced

page 19Advanced C++; Gyuszi Suto, Intel 04/11/23

Variadic Templates

class X { int i; };class Y { float f; };class Z { double d; };

// forward declarationtemplate <typename... E>struct TI;

// template specialization for no type argumenttemplate <>struct TI<> { };

// general (and recursive) template class definitiontemplate <typename H, typename... T> // packing type Tstruct TI<H, T...> : public H, public TI<T...> // unpacking type T{};

int main(void){ cout << sizeof(TI<X, Y, Z>) << endl; // prints 24 cout << sizeof(TI<vector<int>, X, Y>) << endl; // prints 32 return 0;}

vector<int> X Y

multiple inheritance

This is an example of the building block of tuples (TupleImplement TI) using variadic templates. The example is just a small piece of the actual tuple implementation.

Page 20: C++ Advanced

page 20Advanced C++; Gyuszi Suto, Intel 04/11/23

tuple<T0, T1, T2, .., T9> the std::pair<T1, T2> on steroids

pair<int, float> p(3, 9.9);cout << p.first << endl; // print 3

pair<int, pair<float, double> > pxx(5, make_pair(8.8, 11.11));cout << pxx.second.first << endl; // print 8.8

tuple<int> ti(2);cout << get<0>(ti) << endl; // print 2

tuple<int, float, char> tifc(3, 9.9, 'c');cout << get<2>(tifc) << endl; // print 'c‘

tuple<int, tuple<char, char, char>, float, int> txxx(3, make_tuple('c', 'd', 'e'), 13.33, 7);cout << get<2>(get<1>(txxx)) << endl; // print 'e'

int char char float intchar

Page 21: C++ Advanced

page 21Advanced C++; Gyuszi Suto, Intel 04/11/23

tuple constructing

The tuple constructor takes the tuple elements as arguments. For an n-element tuple, the constructor can be invoked with k arguments, where 0 <= k <= 9. For example:

tuple<> t; // sizeof == 2tuple <char> tc; // sizeof == 3tuple <char, char> tcc; // sizeof == 3tuple<int, int> t2; // sizeof == 12tuple<int, int, int, int, int, int, int, int, int> t9; // sizeof == 40

If no initial value for an element is provided, it is default initialized (and hence must be default initializable). For example.

class X{public: X::X(const string & s); // the only ct for X};

tuple<X, X, X> t4; // error, no default ct for X tuple<X, X, X> t5(string("Jaba"), string("Daba"), string("Duu")); // ok

Page 22: C++ Advanced

page 22Advanced C++; Gyuszi Suto, Intel 04/11/23

tuple constructing

References must be constructed explicitly, for example:

tuple<double &> t(4.2); // error, cannot refer to a temporary valuetuple<const double &> tdcr(8.0); // OKconst tuple<double &> ctdr(8.0); // Error

double d = 3.8;tuple<double &> t(d); // OKcout << typeid(t).name();// prints: class std::tr1::tuple<double &, Nil, Nil, Nil, Nil, Nil, Nil, Nil, Nil>

tuple<double &> t2(d + 0.2); // error cannot initialize non-const reference with a temporarytuple<const double &> t3(d + 0.3); // ok

Page 23: C++ Advanced

page 23Advanced C++; Gyuszi Suto, Intel 04/11/23

make_tuple

make_tuple – a more convenient way to create tuplesno need to specify the types

types are deduced to the plain, value-based, non-reference type

int i = 3;float f = 5.5;make_tuple(i, f); // makes a tuple<int, float>

make_tuple(8, 9.9); // makes a tuple<int, double>

tuple<int, int, double> add_multiply_divide(int a, int b) { return make_tuple(a+b, a*b, double(a)/double(b)); // results in tuple<int, int, double>}

class A{};class B{};

void foo(const A & a, B & b) { make_tuple(a, b); // results in tuple<class A, class B> - types reduced to plain, non-ref type}

Page 24: C++ Advanced

page 24Advanced C++; Gyuszi Suto, Intel 04/11/23

make_tuple, ref, cref

// ref and cref are reference wrappers from <functional> header file - see later

A a; B b; const A ca = a;

make_tuple(cref(a), b); // creates tuple<const A&, B>make_tuple(ref(a), b); // creates tuple<A&, B>make_tuple(ref(a), cref(b)); // creates tuple<A&, const B&>make_tuple(cref(ca)); // creates tuple<const A&>make_tuple(ref(ca)); // creates tuple<const A&>

char aa, bb;make_tuple(aa, bb); // creates tuple<char, char> sizeof == 3make_tuple(ref(aa), ref(bb)); // creates tuple<char &, char &> sizeof == 12

int iii = 22, jjj = 100;

cout << (get<0>(make_tuple(ref(iii), ref(jjj))) = 66) << " iii " << iii << "\n";

// creates a tuple<int &, int &>// first element refers to iii // prints 66 iii 66

int & int &

Page 25: C++ Advanced

page 25Advanced C++; Gyuszi Suto, Intel 04/11/23

tuple, accessing elementsget<N>(t)

template<int Idx, class T1, class T2, ..., class TN> RI get(tuple<T1, T2, ..., TN>& tpl);

template<int Idx, class T1, class T2, ..., class TN> RI get(const tuple<T1, T2, ..., TN>& tpl);

The template functions return a reference to the value at index Idx in the tuple object tpl. If the corresponding type Ui is a reference type both functions return Ui; otherwise the first function returns Ui& and the second function returns const Ui&.

where t is a tuple object and N is a constant integral expression specifying the index of the element to be accessed. Depending on whether t is const or not, get returns the Nth element as a reference to const or non-const type. The index of the first element is 0 and thus N must be between 0 and k-1, where k is the number of elements in the tuple. Violations

of these constrains are detected at compile time. Examples:

double d = 2.7; A a;tuple<int, double&, const A&> t(1, d, a);const tuple<int, double&, const A&> ct = t;

int i = get<0>(t); // okint j = get<0>(ct); // okget<0>(t) = 5; // ok get<0>(ct) = 5; // error, can't assign to const

double e = get<1>(t); // ok get<1>(t) = 3.14; // ok get<2>(t) = A(); // error, can't assign to const A aa = get<3>(t); // error: index out of bounds

++get<0>(t); // ok, can be used as any variable, get returns a reference, in place ++

// tuple-like access is provided for std::pairpair<int, float> pif(3, 4.4);cout << get<1>(pif); // prints 4.4

Page 26: C++ Advanced

page 26Advanced C++; Gyuszi Suto, Intel 04/11/23

ties - tuples

Ties are tuples, where all elements are of non-const reference types.

int i; char c; double d; ...tie(i, c, a);

The above tie function creates a tuple of type

tuple<int&, char&, double&>;

The same result could be achieved with the call

make_tuple(ref(i), ref(c), ref(a));

A tuple that contains non-const references as elements can be used to 'unpack' another tuple into variables. e.g.:

char c; double d; int i; tie(i, c, d) = make_tuple(1,'a', 5.5);std::cout << i << " " << c << " " << d; // prints 1 ‘a’ 5.5

d

0x8000

c

0x8004

i

0x8008

int &0x8008

char &0x8004

double &0x8000

tie(i, c, d) = make_tuple(1,'a', 5.5);

int 1 char ‘a’ double 5.5

temporary (lvalue) temporary

stack variablesnon-temporary

Page 27: C++ Advanced

page 27Advanced C++; Gyuszi Suto, Intel 04/11/23

ties - tuples

IgnoreThere is also an object called ignore which allows you to ignore an element assigned by a tuple. The idea is that a function may return a tuple, only part of which you are interested in. For example

int i; char c = ‘x’; double d; tuple<int, char, double> tup(2, 'b', 6.6);tie(i, ignore, d) = tup;cout << i << " " << c << " " << d; // prints 2 ‘x’ 6.6

d

0x8000

c

0x8004

i

0x8008

int &0x8008

double &0x8000

tie(i, ignore, d) = tup;

int 2 char ‘b’ double 6.6

temporary (lvalue)

stack variablesnon-temporary

tup 0x800C stack variable

// tying to a std::pair

int i; char c = ‘x’; pair<int, char> pp(33, ‘w’);tie(i, c) = pp; // doesn’t compile (??)tie(i, c) = tuple<int, char>(pp); // works OK

Page 28: C++ Advanced

page 28Advanced C++; Gyuszi Suto, Intel 04/11/23

tuples - performance

All tuple access and construction functions are small inlined one-liners. Therefore, a decent compiler can eliminate any extra cost of using tuples compared to using hand written tuple like classes. Particularly, with a decent compiler there is no performance difference between this code: class hand_made_tuple { A a; B b; C c;public: hand_made_tuple(const A& aa, const B& bb, const C& cc) : a(aa), b(bb), c(cc) {}; A& getA() { return a; }; B& getB() { return b; }; C& getC() { return c; };};

hand_made_tuple hmt(A(), B(), C()); hmt.getA(); hmt.getB(); hmt.getC();

and this code:

tuple<A, B, C> t(A(), B(), C());t.get<0>(); t.get<1>(); t.get<2>();

There’s a memory overhead of 1-3 bytes per tuple – may be compiler and/or optimization level dependentCompiler error messages are very hard to understand!

Page 29: C++ Advanced

page 29Advanced C++; Gyuszi Suto, Intel 04/11/23

Recap auto_ptr<T>

ap0 X

ap0 X

ap1

ap0 X

ap1

auto_ptr<X> ap0(new X());

auto_ptr<X> ap1;

ap1 = ap0; // ap0 releases X

// ap1 destructor// calls X destructor

0

0

One object can only be pointed to by max one auto pointerWhen the auto pointer goes out of scope, it deletes the object it points toAuto pointer was accepted hastily into the standard, the ANSI committee agrees it was a mistake

•Tuples•Smart Pointers•The array Class Template•Unordered Associative Containers (hash tables)•mem_fn Function template•reference_wrapper•The function Class Template•The bind Function Template•Type Traits•Regular Expressions

Page 30: C++ Advanced

page 30Advanced C++; Gyuszi Suto, Intel 04/11/23

Usage of - and Issues with - auto_ptr<T>

class X{…};

int main(void){ const auto_ptr<X> cap(new X(4)); auto_ptr<X> bb = cap; // compiler error, no copy semantics from const vector<auto_ptr<X> > xv; try{ auto_ptr<X> ap1(new X(3)); // ap1 points to an X xv.push_back(ap1); // compile error: no copy constructor for auto ptr { auto_ptr<X> ap3, ap4; // both point to null ap3 = ap1; // ap1.release() is called internally, ap3 points to X ++ap3; // compiler error, cannot increment auto pointer ap1->x = 7; // this throws !! ap4 = ap3; // ap3.release(), ap4 points to X X & xr = *ap4; // both ap4 and xr refer to same X X * xp = ap4; // compiler error X * xp2 = ap4.get(); // OK, uses auto_ptr.get() member function to get an X* } // ap4 goes out of scope, X is destroyed ap1->x = 3; // throws } catch (...) { cout << "caught throw \n"; } return 0;}

Page 31: C++ Advanced

page 31Advanced C++; Gyuszi Suto, Intel 04/11/23

shared_ptr

shared_ptr<X> p0;auto_ptr<X> ap = new X;

p0.unique() -> false (garbage?)p0.use_count() -> 0 (garbage?)p0 -> false

p0

ap X

1 p0 = ap;

p0.unique() -> truep0.use_count() -> 1p0 -> true

p0

ap X

2 shared_ptr<X> p1 = p0;X * xp = p0.get();

p0.unique() -> falsep0.use_count() -> 2 xp not countedp0 -> true

p0

p1 X

3

shared_ptr<X> p2 = p1;

p0.unique() -> falsep0.use_count() -> 3p0 -> true

p0

p1 X

4

p2

p0.reset();

p1.unique() -> falsep1.use_count() -> 2p1 -> true

p0

p1 X

5

p2

p2.reset(new X);

p1.unique() -> truep1.use_count() -> 1p1 -> true

p0

p1 X

6

p2X

p1.swap(p2);

p1.unique() -> truep1.use_count() -> 1p1 -> true

p0

p1 X

7

p2X

p2 = p1;

p1.unique() -> falsep1.use_count() -> 2p1 -> true

p0

p1 X

8

p2X

destructor call

// p0, p1 and p2 go out of// scope

// p0 destructor// p1 destructor// X destructor // finally p2 destructor

p0

p1

9

p2X

destructor call

xp

Page 32: C++ Advanced

page 32Advanced C++; Gyuszi Suto, Intel 04/11/23

shared_ptr convertible pointers

shared_ptr<X> p0(new X(33, 44));

p0.unique() -> truep0.use_count() -> 1

p0 X

1 shared_ptr<void> vp(p0);// X* is convertible to void*

p0.use_count() -> 2vp.use_count() -> 2

p0

vp X

2 p0.reset();

p0.use_count() -> 0vp.use_count() -> 1

p0

vp X

3

vp.reset();

// vp knows what destructor to call// b/c the s_p<T> constructor is// templated on the argument (2 templates)

p0.use_count() -> 0vp.use_count() -> 0

p0

vp X

4

// X destructor

shared_ptr<void> vp2(new X(1, 2));

vp2.use_count() -> 1

vp2

X

5shared_ptr<void> vp3(vp2);

vp2.use_count() -> 2

vp2

X

6

vp3

shared_ptr<X> p4(vp2); // Error

vp2 has no type knowledge to allow construction of p4

vp2

X

7

vp3

p4

shared_ptr<X> p5(*(reinterpret_cast<shared_ptr<X> *>(&vp2))); // OK,ugly

vp2

X

8

vp3

p5

vp2.use_count() -> 3

Page 33: C++ Advanced

page 33Advanced C++; Gyuszi Suto, Intel 04/11/23

shared_ptr

X

_Ptr _Rep

_Ref_cnt_base

4bytes 4bytes

16 bytes

_Ptr _Rep

4bytes 4bytes

_Ptr _Rep

4bytes 4bytes

p0

p1

v0

X

_Ptr _Rep

_Ref_cnt_base

4bytes 4bytes

16 bytes

p2

shared_ptr<X> p0(new X);

shared_ptr<X> p1(p0);

shared_ptr<void> v0(p0);

shared_ptr<x> p2(new X);

Note: 16 byte overhead per object, 4 byte overhead per pointer(may be compiler dependent)

v0.get() returns _Ptr (X *)

Page 34: C++ Advanced

page 34Advanced C++; Gyuszi Suto, Intel 04/11/23

shared_ptr<T> pointing to objects on heap, stack, static memory

class X{public: X(const string &); // constructor X(const X &); // copy constructor ~X(); // destructor};

// global object in static memoryX xg2("def");

// global shared X pointer in static memoryshared_ptr<X> gsp; // currently null

void testSharedPtrToAuto(void){ shared_ptr<X> sp1(new X("abc")); // sp1 points to X on heap

shared_ptr<X> sp2(&xg2); // sp2 points to xg2, compiles but dangerous!! shared_ptr<X> sp3; // sp3 set to null // sp3 = &xg2; // compiler error X localX(“def”); // one X on stack // gsp = &localX; // compiler error shared_ptr<X> sp4(&localX); // compiles, dangerous, 2 destructor calls to same object!!

// when sp2 goes out of scope, it calls the destructor on xg2, bad, crash!}

Page 35: C++ Advanced

page 35Advanced C++; Gyuszi Suto, Intel 04/11/23

shared_ptr<T> API

// comes with an overhead of an extra integer// use it across multiple parties/modules, when there’s no clear ownership of shared data

template<class T> class shared_ptr {

public:

typedef T element_type;

shared_ptr(); // never throws // Constructs a shared_ptr that owns the pointer p. template<class Y> explicit shared_ptr(Y * p); // Constructs a shared_ptr that owns the pointer p and the deleter d. template<class Y, class D> shared_ptr(Y * p, D d); ~shared_ptr(); // never throws

shared_ptr(shared_ptr const & r); // never throws template<class Y> shared_ptr(shared_ptr<Y> const & r); // never throws template<class Y> explicit shared_ptr(weak_ptr<Y> const & r); // clarify how one type is converted to other template<class Y> explicit shared_ptr(std::auto_ptr<Y> & r);

shared_ptr & operator=(shared_ptr const & r); // never throws template<class Y> shared_ptr & operator=(shared_ptr<Y> const & r); // never throws template<class Y> shared_ptr & operator=(std::auto_ptr<Y> & r);

void reset(); // never throws template<class Y> void reset(Y * p); template<class Y, class D> void reset(Y * p, D d);

T & operator*() const; // never throws T * operator->() const; // never throws T * get() const; // never throws bool unique() const; // never throws long use_count() const; // never throws operator unspecified-bool-type() const; // never throws void swap(shared_ptr & b); // never throws };

Page 36: C++ Advanced

page 36Advanced C++; Gyuszi Suto, Intel 04/11/23

Loops with shared pointers

struct L { Data d; shared_ptr<M> mptr;};

struct M { Stuff s; shared_ptr<L> lptr;};

p1.use_count() -> 2lptr.use_count() -> 2mptr.use_count() -> 1

p1

d

mptr

s

lptr

L M

In case p1 goes out of scope, you lose your only handle to these two structures, and they are left in memory with no way of ever freeing them!

p1

d

mptr

s

lptr

L Mp1.use_count() -> 0lptr.use_count() -> 1mptr.use_count() -> 1

Page 37: C++ Advanced

page 37Advanced C++; Gyuszi Suto, Intel 04/11/23

weak_ptr

struct L { Data d; shared_ptr<M> mptr;};

struct M { Stuff s; weak_ptr<L> lptr;};

p1.use_count() -> 1lptr.use_count() -> 1mptr.use_count() -> 1

p1

d

mptr

s

lptr

L M

p1

d

mptr

s

lptr

L Mp1.use_count() -> 0lptr.use_count() -> 0mptr.use_count() -> 1

d

mptr

s

lptr

L Mlptr.use_count() -> 0mptr.use_count() -> 0

s

lptr

L M

The weak_ptr class template stores a "weak reference" to an object that's already managed by a shared_ptr.

Page 38: C++ Advanced

page 38Advanced C++; Gyuszi Suto, Intel 04/11/23

weak_ptr behavior

shared_ptr<X> p1(new X);shared_ptr<X> p2 = p1;weak_ptr<X> w0 = p1;

w0.unique() -> falsew0.use_count() -> 2w0 -> true

p1

X

1

p2

w0

weak_ptr<X> w1 = w0;

w0.unique() -> falsew0.use_count() -> 2

p1

X

2

p2

w0

w1

// p1 goes out of scope

w0.unique() -> truew0.use_count() -> 1

p1

X

3

p2

w0

w1

// w1 goes out of scope

w0.unique() -> truew0.use_count() -> 1

X

4

p2

w0

w1

// p2 goes out of scope

w0.unique() -> ??w0.use_count() -> 0w0.expired() -> truew0 -> false

X

5

p2

w0

you cannot have a weak pointerpointing to an object that’s notpointed to by a shared_ptr

w0 X

this cannot happen

A weak_ptr will never call the destructor of the object it points to. Use the weak pointer as an observer to data owned and managed by shared pointer(s)

Page 39: C++ Advanced

page 39Advanced C++; Gyuszi Suto, Intel 04/11/23

weak_ptr behavior

shared_ptr<X> p1(new X);weak_ptr<X> w0 = p1;

p1.unique() -> truew0.use_count() -> 1w0 -> true

p1

X

1

w0

p1.unique() -> falsew0.use_count() -> 2

p1

X

2

p3

w0

//shared_ptr<X> p2 = w0; // errorshared_ptr<X> p3(w0); // explicit// orshared_ptr<X> p3 = w0.lock();

p1

X

3

p3

w0

shared_ptr<X> p4(new X);

p4

X

p1

X

4

p3

w0

weak_ptr<X> w1 = p4;

p4

X

w1

p1

X

5

p3

w0

p1.swap(p4);

p4

X

w1

p1

X

5

p3

w0

w0.swap(w1);

p4

X

w1

// w1.lock() == p3 == p4

Page 40: C++ Advanced

page 40Advanced C++; Gyuszi Suto, Intel 04/11/23

weak_ptr API

template<class T> class weak_ptr {

public: typedef T element_type;

weak_ptr();

template<class Y> weak_ptr(shared_ptr<Y> const & r); weak_ptr(weak_ptr const & r); template<class Y> weak_ptr(weak_ptr<Y> const & r);

~weak_ptr();

weak_ptr & operator=(weak_ptr const & r); template<class Y> weak_ptr & operator=(weak_ptr<Y> const & r); template<class Y> weak_ptr & operator=(shared_ptr<Y> const & r);

long use_count() const; bool expired() const; shared_ptr<T> lock() const;

void reset(); void swap(weak_ptr<T> & b); };

Page 41: C++ Advanced

page 41Advanced C++; Gyuszi Suto, Intel 04/11/23

Recap C-style array

const string def = “default_string”;

class X{public: string st; X(const string & s = def} : s(st) {} ~X() {}};

X c[3]; // initializes 3 default Xs on static memory

int main(void){ int i[3]; // initializes 3 ints on stack memory X m[3]; // initializes 3 default Xs on stack memory X p[3] = {“I”, X(“ am “)}; // “I” “ am “ “default_string” // 6 X destructor calls for m[] and p[] return 0;}

// after main terminates// 3 X destructor calls for c[]

C-style arrays have •no bounds checking•no proper copy semantics•no STL-like iterators•no swap functionality•no block initialization

Page 42: C++ Advanced

page 42Advanced C++; Gyuszi Suto, Intel 04/11/23

Recap C-style arrays; Overloading operators new[] and delete[] in a plain old data class (POD)

class X{public: int luba; static char BUF[10000]; static int cnt; void set(int ii) { luba = ii;}

void * operator new (size_t sz){ void * retval = &BUF[cnt]; cnt += sz; return retval; }

void * operator new [] (size_t sz){ void * retval = &BUF[cnt]; cnt += sz; return retval; }

void operator delete(void * todel){ // currently is a no-op } void operator delete [] (void * todel){ // currently is a no-op } };

char X::BUF[10000];int X::cnt = 0;

int main(void){ X * x1 = new X; x1->set(1); X * xa = new X[3]; xa[0].set(4); xa[1].set(4); xa[2].set(4); X * xaa = new X[2]; xaa[0].set(7); xaa[1].set(7);

delete x1; delete [] xa; delete [] xaa; return 1;}

x1

1 4 4 4 7 7

BUFxa[0] xaa[0]

New operator calls operatornew(4)

New operator [3]calls operatornew[](12)

Heap memory

Page 43: C++ Advanced

page 43Advanced C++; Gyuszi Suto, Intel 04/11/23

C++ Classes; Overloading operators new[] and delete[] in a class with constructor (non POD)

class X{public: int luba; static char BUF[10000]; static int cnt; void set(int ii) { luba = ii;} void * operator new (size_t sz){ void * retval = &BUF[cnt]; cnt += sz; return retval; }

void * operator new [] (size_t sz){ void * retval = &BUF[cnt]; cnt += sz; return retval; }

void operator delete(void * todel){ // currently is a no-op } void operator delete [] (void * todel){ // currently is a no-op } X(int ii = 9) : luba(ii){} ~X(void) { luba = -1;}};

char X::BUF[10000];int X::cnt = 0;

int main(void){ X * x1 = new X; x1->set(1); X * xa = new X[3]; xa[0].set(4); xa[1].set(4); xa[2].set(4); X * xaa = new X[2]; xaa[0].set(7); xaa[1].set(7);

delete x1; delete [] xa; return 1;}

x1

1 3 0 4 4 4 2 0 7 7

BUF

New operator [3]calls operatornew[](20)Allocates fromaddress A, butreturns addressB !!

Number of elements in xa[]

xa[0]

Number of elements in xaa[]

xaa[0]A B

Heap memory

Page 44: C++ Advanced

page 44Advanced C++; Gyuszi Suto, Intel 04/11/23

array<class T, size_t N>

array<X, 3> a; // initializes 3 default Xs

array<X, 3> b{“I”, “am“}; // “I” “am” “default_string”

array<X, 3> d(b); // copy constructor

array<X, 3> e = {X("how"), X("are"), X("you?")};

array<X, 3> f = {"I", "am", "fine."};

e.swap(f);

array<X, 4> g;g.swap(f); // compiler error, bad argument typef.swap(g); // compiler error, bad argument type

how are you?

I am fine.

e

f

default_stringa default_string default_string

Ib am default_string

Id am default_string

g

f

cannotbe swapped

Page 45: C++ Advanced

page 45Advanced C++; Gyuszi Suto, Intel 04/11/23

array<class T, size_t N>

It has all the standard iterators

T & front() T & back()

iterator begin() iterator end()

reverse_iterator rend() reverse_iterator rbegin()

Page 46: C++ Advanced

page 46Advanced C++; Gyuszi Suto, Intel 04/11/23

array<class T, size_t N>

array<X, 3> a = {“The”, “Seinfeld”, “show”};

a.assign(“blah”); // error, need X as argument

a.assign(X(“yadda”)); // OK, set all 3 elements

a[0]; // returns reference to elem 0

a[3]; // returns reference to non-existing elem – no bounds check

a.at(0); // bounds-checked, returns reference to elem 0

a.at(3); // throws std::out_of_range exception

a.size(); // returns 3

a.max_size(); // return big number

a.empty(); // return false

array<X, 0> x;x.empty(); // returns true

a[i]; // i is runtime variable

The Seinfeld show

yadda yadda yadda

yadda yadda yadda

a[0] a[3]

yadda yadda yadda

a.at(0) a.at(3)

unchecked

checked

throws

Page 47: C++ Advanced

page 47Advanced C++; Gyuszi Suto, Intel 04/11/23

array<class T, size_t N>

array<X, 3> a = {“The”, “Seinfeld”, “show”};

// tuple-like element accessget<1>(a); // return Seinfeld

get<3>(a); // compile time error, out of bounds

get<i>(a); // OK, if i has a compile-time value

a.data(); // returns the address of a[0]

// 2D arrayarray<array<int, 3>, 4> myarr2d2;myarr2d2[3][2] = 3;

// 3D arrayarray<array<array<int, 3>, 12>, 33> my3Darray;

// etc.

Page 48: C++ Advanced

page 48Advanced C++; Gyuszi Suto, Intel 04/11/23

forward_list

#include <forward_list>#include <list>#include <iostream>using namespace std;

void p(forward_list<int> & fl){ for(auto itr = fl.begin(); itr != fl.end(); ++itr){ cout << *itr << " "; } cout << "\n";}

int main(void){ forward_list<int> fl1(2, 7); // create a list of 3 default ints p(fl1); // 7 7

fl1.push_front(5); p(fl1); // 5 7 7

forward_list<int>::iterator lit; lit = fl1.begin(); fl1.insert_after(++lit, 6); p(fl1); // 5 7 6 7

fl1.pop_front(); p(fl1); // 7 6 7

fl1.push_front(10); p(fl1); // 10 7 6 7

continued…   

Page 49: C++ Advanced

page 49Advanced C++; Gyuszi Suto, Intel 04/11/23

forward_list

… continued

fl1.push_front(10); p(fl1); // 10 7 6 7

fl1.push_front(10); p(fl1); // 10 10 7 6 7

fl1.unique(); // eliminates duplications from consecutive groups of elements p(fl1); // 10 7 6 7

forward_list<int> fl3 = {1, 2, 3}; forward_list<int> fl2 = {8, 9};

p(fl3); // 1 2 3

fl3.merge(fl1); 1 2 3 10 7 6 7 p(fl3);

fl3.splice_after(fl3.begin(), fl2); p(fl3); // 1 8 9 2 3 10 7 6 7

return 0;}

   

Page 50: C++ Advanced

page 50Advanced C++; Gyuszi Suto, Intel 04/11/23

forward_list

    // forward_list reference API

    iterator before_begin();    const_iterator before_begin() const;    iterator begin();     const_iterator begin() const;     iterator end();     const_iterator end() const;

    const_iterator cbegin() const;    const_iterator cbefore_begin() const;    const_iterator cend() const;

    // capacity:     bool empty() const;     size_type max_size() const;

    // 23.2.x.3 element access:     reference front();     const_reference front() const;

    // 23.2.x.4 modifiers:     template <class... Args> void push_front(Args&&... args);     void pop_front();       template <class... Args> iterator emplace_after(const_iterator position, Args&&... args);    iterator insert_after(const_iterator position, const T& x);    iterator insert_after(const_iterator position, T&& x);

    void insert_after(const_iterator position, size_type n, const T& x);     template <class InputIterator>       void insert_after(const_iterator position, InputIterator first, InputIterator last);

    iterator erase_after(const_iterator position);     iterator erase_after(const_iterator position, iterator last);     void swap(forward_list<T,Allocator>&&);

    void resize(size_type sz);    void resize(size_type sz, value_type c);    void clear();

   

Page 51: C++ Advanced

page 51Advanced C++; Gyuszi Suto, Intel 04/11/23

forward_list

        void splice_after(const_iterator position, forward_list<T,Allocator>&& x);     void splice_after(const_iterator position, forward_list<T,Allocator>&& x, const_iterator i);     void splice_after(const_iterator position, forward_list<T,Allocator>&& x, const_iterator first, const_iterator last);

    void remove(const T& value);     template <class Predicate> void remove_if(Predicate pred);

    void unique();     template <class BinaryPredicate> void unique(BinaryPredicate binary_pred);

    void merge(forward_list<T,Allocator>&& x);     template <class Compare> void merge(forward_list<T,Allocator>&& x, Compare comp);

    void sort();     template <class Compare> void sort(Compare comp);

    void reverse();  

Page 52: C++ Advanced

page 52Advanced C++; Gyuszi Suto, Intel 04/11/23

rvalue reference

#include <forward_list>#include <list>#include <iostream>using namespace std;

int glob_int = 22;

int get_some_int(void) { return 9; }int & get_some_ref_to_int(void) { return glob_int; }

int main(void){ int i = 5; int & j = i; // lvalue reference to i (alias to i)

j = 7; // sets i to 7

int && k = 8; // rvalue reference to temporary int && m = get_some_int(); // rvalue reference to a temporary returned by value

int const & n = 8; // const ref int const & o = get_some_int(); // const reference to a temporary returned by value

int & p = 9; // error int & q = get_some_int(); // error int & r = get_some_ref_to_int(); // okay

return 0;}

Page 53: C++ Advanced

page 53Advanced C++; Gyuszi Suto, Intel 04/11/23

rvalue reference// /usr/intel/pkgs/gcc/4.4.0/bin/g++ -fno-elide-constructors -std=c++0x -g rvalue_ref.cc// prevents return value optimization in the compiler

class X{public: char * c;

X(int sz) : c(new char[sz]) {} X(char * s) : c(new char[strlen(s)]) { strcpy(c, s); } X(X const & rs) : c(new char[strlen(rs.c)]) { strcpy(c, rs.c); } // deep copy construct

X operator + (X const & rs) const { X retval(strlen(c) + strlen(rs.c)); strcpy(retval.c, c); strcat(retval.c, rs.c); return retval; }

X & operator = (X const & rs) { delete c; c = new char[sizeof(rs.c)]; strcpy(c, rs.c); return * this; }};

main(void){ X x1("a"), x2("bb"), x3("ccc"), x4("dddd"); x1 = x2 + x3 + x4; return 0;}

ct char * act char * bbct char * cccct char * ddddop + returning by valuect with size: 5deep copy ct with size5op + returning by valuect with size: 9deep copy ct with size9op = deep copy 3 deep copies!

Page 54: C++ Advanced

page 54Advanced C++; Gyuszi Suto, Intel 04/11/23

rvalue referenceclass X{public: char * c; X(int sz) : c(new char[sz]) {} X(char * s) : c(new char[strlen(s)]) { strcpy(c, s); } X(X const & rs) : c(new char[strlen(rs.c)]) {strcpy(c, rs.c); } // deep copy construct X(X && rs) : c(rs.c) { rs.c = 0; } // shallow copy, modifies rs !!

X operator + (X const & rs) const { cout << "op + returning by value" << endl; X retval(strlen(c) + strlen(rs.c)); strcpy(retval.c, c); strcat(retval.c, rs.c); return std::move(retval); // gives its target the value of its argument, but not obliged to // preserve the value of its source, returns X && } X & operator = (X && rs) // modifies rs !! { // doing a shallow copy with move semantics delete c; c = rs.c; rs.c = 0; return * this; } X & operator = (X const & rs) { delete c; c = new char[sizeof(rs.c)]; strcpy(c, rs.c); return * this; }};main(void){ X x1("a"), x2("bb"), x3("ccc"), x4("dddd"); x1 = x2 + x3 + x4; return 0;}

ct char * act char * bbct char * cccct char * ddddop + returning by valuect with size: 5copy ct shallowop + returning by valuect with size: 9copy ct shallowshallow op=

No deep copies !!

Page 55: C++ Advanced

page 55Advanced C++; Gyuszi Suto, Intel 04/11/23

rvalue reference

template <class T>void swap1(T & a, T & b) // version 1. baseline-version{ T tmp(a); // creating one teporary, we have a a b a = b; // we have a b b b = tmp; // we have a a b}

template <class T>void swap2(T & a, T & b) // version 2. using move semantics{ T tmp(std::move(a)); // line of interest, a = std::move(b); b = std::move(tmp);}

int main(void){ vector<int> v1(1000000000); // 1 billion ints = 4 Gb vector<int> v2(1000000000);

swap1(v1, v2); // 63s, 12Gb memory

swap2(v1, v2); // 21s, 8Gb memory

return 0;}

a

b

tmp

time

a

b

tmp

time

Page 56: C++ Advanced

page 56Advanced C++; Gyuszi Suto, Intel 04/11/23

rvalue reference

template <class T>void swap2(T & a, T & b) // version 2. using move semantics{ T tmp(std::move(a)); // line of interest, …}

// in file type_traitstemplate<typename _Tp> // primary template classstruct remove_reference{ typedef _Tp type; };

template<typename _Tp> // specialized for _Tp&struct remove_reference<_Tp&>{ typedef _Tp type; }; // provides a typedef known at compile time, removes reference

// in file move.h, accepts either lvalue or rvalue argument// returns it as an rvalue without triggering a copy constructiontemplate<typename _Tp>typename std::remove_reference<_Tp>::type&&move(_Tp&& __t){ return __t; }

// the red line from above transforms to T tmp(a); // where now a is of type T &&

// and in the context of vectorvector<int> tmp(a); // where a is of type vector<int> &&

Page 57: C++ Advanced

page 57Advanced C++; Gyuszi Suto, Intel 04/11/23

rvalue reference

// and in the context of vector

vector<int> tmp(a); // where a is of type vector<int> &&

// calls copy constructor of vector that takes rvalue reference

vector(vector && __x) : _Base(std::forward<_Base>(__x)) { }

// forward just returns

template<typename _Tp>

_Tp && forward(typename std::identity<_Tp>::type && __t) { return __t; }

// special copy constructor of vector base that takes rvalue reference

// it just hands over the memory from __x to self, makes no copy

// __x is empty after execution

_Vector_base(_Vector_base&& __x) : _M_impl(__x._M_get_Tp_allocator())

{

this->_M_impl._M_start = __x._M_impl._M_start;

this->_M_impl._M_finish = __x._M_impl._M_finish;

this->_M_impl._M_end_of_storage = __x._M_impl._M_end_of_storage;

__x._M_impl._M_start = 0;

__x._M_impl._M_finish = 0;

__x._M_impl._M_end_of_storage = 0;

}

vector& operator=(vector&& __x) // special version of op= taking an rvalue reference argument

{

this->clear();

this->swap(__x); // choice made by vector to implement op= using vector swap

return *this;

}

Page 58: C++ Advanced

page 58Advanced C++; Gyuszi Suto, Intel 04/11/23

rvalue reference

float time_it()

{

std::sort(v.begin(), v.end());

t2 = clock();

std::cout << "sort took " << (float)((t2 - t1)/(double)CLOCKS_PER_SEC) << std::endl;

std::rotate(v.begin(), v.begin() + v.size()/2, v.end());

t3 = clock();

std::cout << "rotate took " << (float)((t3 - t2)/(double)CLOCKS_PER_SEC) << std::endl;

}

/usr/intel/pkgs/gcc/4.4.0/bin/g++ -g -O2 rvalue2.cpp -o rvalue2.no_rr

Total time = 15.45

/usr/intel/pkgs/gcc/4.4.0/bin/g++ -g -O2 -std=c++0x rvalue2.cpp -o rvalue2.rr

Total time = 6.62

You can benefit from the rvalue reference and move semantics in STL even without explicitly using (or knowing about) the X && x1 syntax. Just by compiling with the proper option, you’ll enable STL to eliminate several superfluous copies. In this example you get a 2x speedup

Page 59: C++ Advanced

page 59Advanced C++; Gyuszi Suto, Intel 04/11/23

Recap: Custom Comparators for STL containers

struct Z{ int i, j; Z(int ii, int jj) : i(ii), j(jj) {}

bool operator < (const Z & rs) const { if(i < rs.i) return true; if(i == rs.i) return (j < rs.j); return false; }};

void bar(){ set<Z> xs; // using Z::op< by default xs.insert(Z(3, 4)); xs.insert(Z(3, 5)); xs.insert(Z(2, 6)); xs.insert(Z(2, 6));

// (2, 6)(3, 4)(3, 5) is the order // one element (2, 6) is dropped (==)}

Conditions that the comparator must satisfy:

Irreflexivity f(x, x) must be false. Antisymmetry f(x, y) implies !f(y, x) Transitivity f(x, y) and f(y, z) imply f(x, z). Equivalence !f(x, y) and !f(y, x) implies x == y

a < a -> falsea < b -> !(b < a)a < b && b < c -> a < c!(a < b) && !(b < a) -> a == b

Page 60: C++ Advanced

page 60Advanced C++; Gyuszi Suto, Intel 04/11/23

Recap: Custom Comparators for STL containers

struct Z{ int i, j; Z(int ii, int jj) : i(ii), j(jj) {}};

class ZComp : public binary_function<const Z &, const Z &, bool>{public: bool operator() (const Z & l, const Z & r) { if(l.i < r.i) return true; if(l.i == r.i) return (l.j < r.j); return false; }};

void bar(){ set<Z, ZComp> xs; xs.insert(Z(3, 4)); xs.insert(Z(3, 5)); xs.insert(Z(2, 6)); xs.insert(Z(2, 6));

// (2, 6)(3, 4)(3, 5) is the order // one element (2, 6) is dropped (==)}

you can write this binary function if:Z has no op <•You cannot modify Z•You don’t like the behavior of Z::op <•your set is a set Z* in which case you don’t want pointer comparison •you want your own sorting behavior, different from Z::op<

// an uglier alternativebool zff(const Z & l, const Z & r){ if(l.i < r.i) return true; if(l.i == r.i) return (l.j < r.j); return false;}

void bar(){ set<Z, bool (*) (const Z &, const Z &)> xs(zff);}

function pointer type declaration

the function pointer

Page 61: C++ Advanced

page 61Advanced C++; Gyuszi Suto, Intel 04/11/23

Recap: More sophisticated comparator predicates

class MyintComparator : public binary_function<int, int, bool>{ int referenceArgument_;public: explicit MyintComparator(int refArg) : referenceArgument_(refArg) {} bool operator () (int l, int r) { return std::abs(l - referenceArgument_) < std::abs(r - referenceArgument_); }};

void foo(){ std::list<int> myList; myList.push_back(2); myList.push_back(11); myList.push_back(8); myList.sort(MyintComparator(10)); // 11, 8, 2 std::set<int, MyintComparator> mySet(MyintComparator(7)); mySet.insert(2); mySet.insert(11); mySet.insert(8); // 8, 11, 2

// you can also do this with boost bind, see later return;}

2 3 4 5 6 7 8 9 10 11 12

2 3 4 5 6 7 8 9 10 11 12

Page 62: C++ Advanced

page 62Advanced C++; Gyuszi Suto, Intel 04/11/23

Recap: Mistakes in op <

// Bad examplestruct Z{ int i, j; Z(int ii, int jj) : i(ii), j(jj) {}

bool operator < (const Z & rs) const { if(i > rs.i || j > rs.j) return false; return true; }};

Z z1(2, 3), z2(3, 2);

// z1 < z2 -> true// z2 < z1 -> true// // Does not satisfy antisymmetry// and Equivalence// // You can have all kinds of issues:// // not finding an inserted element// not getting things back in the correct// order// multiple copies inserted in a set, etc.

Say you have a map, which is a red-black tree internally. The tree has the following elements in it:

A / \ B C

and you are about to insert element D. The following comparisons will be done:

D < AA < D if both are false then it means equality and D is inserted over A

if D < A then it will go on the left branch and perform 2 more op<

D < BB < D if both are false, then it inserts D over B (or just drops D), if not, it will make further branches left or right from B

If the op< is incorrect, then you may insert D and never find it in the set

Page 63: C++ Advanced

page 63Advanced C++; Gyuszi Suto, Intel 04/11/23

Recap: Mistakes in op <

// Bad examplestruct Z{ int i, j; Z(int ii, int jj) : i(ii), j(jj) {}

};

class BadZComp : public binary_function<const Z &, const Z &, bool>{public: bool operator() (const Z & l, const Z & r) { return (l.i < r.i || l.i < r.j); // faulty implementation }};int main(void){ //set<Z> xs; set<Z, BadZComp> xs; xs.insert(Z(3, 4)); xs.insert(Z(3, 5)); xs.insert(Z(2, 6));

set<Z, BadZComp>::iterator sit = xs.find(Z(3, 4)); return 0;}

In order to find element Z(3,4) in the setthe operator < needs to return

!(a < b) && !(b < a)

Z(3, 4) < Z(3, 4) will return true, thereforethe element will never be found in the set

Page 64: C++ Advanced

page 64Advanced C++; Gyuszi Suto, Intel 04/11/23

Unordered associative containers:unordered_map, unordered_multimap, unordered_set, unordered_multiset

class WWID {public: int wwid; WWID(int i = -1) : wwid(i) {} bool operator == (const WWID & rs) const { return wwid == rs.wwid; }};

class ShoeSize {public: int shoeSize; ShoeSize(int i = 4) : shoeSize(i) {} bool operator == (const ShoeSize & rs) const { return shoeSize == rs.shoeSize; }};

// Notice: no operator < () required on the types above – no ordering is happening

struct MyHash { // return values 0, 1 or 2 size_t operator() (const WWID & key) const { return key.wwid % 3; } };

These four types implement the hash tables in TR1The name hash_table was widely used in legacy code, hence these new namesThese types are unordered, no operator < is required on themOperator == is required on the keyThe unordered_map and unordered_set keep unique copies of elementsThe unordered_multimap and unordered_multiset keep multiple elements with matching keys – lumped together in groups

Page 65: C++ Advanced

page 65Advanced C++; Gyuszi Suto, Intel 04/11/23

unordered_map

const MyHash myhash; // one object // key is WWID, type is ShoeSize unordered_map<WWID, ShoeSize, MyHash> map2(3, myhash); // need 3 buckets, use myhash cout << map2.bucket_count(); // prints 8 (upsizes to power of 2 >= 8) cout << map2.max_bucket_count(); // prints 8 cout << map2.bucket_size(0); // prints 0 cout << map2.size(); // prints 0 – number of element in the unordered_map

0

1

2

3

4

5

6

7

bucket_count()

max_bucket_count()

bucket_size(0) == 0

Page 66: C++ Advanced

page 66Advanced C++; Gyuszi Suto, Intel 04/11/23

unordered_map

0 (0, 5)

1

2

3

4

5

6

7

map2.insert(pair<WWID, ShoeSize>(WWID(0), ShoeSize(5)));// bs(0):1 bs(1):0 bs(2):0 size:1 b_cnt:8 max_b_cnt:8map2.insert(make_pair(WWID(1), ShoeSize(6))); // bs(0):1 bs(1):1 bs(2):0 size:2 b_cnt:8 max_b_cnt:8map2.insert(pair<WWID, ShoeSize>(2, 7)); // bs(0):1 bs(1):1 bs(2):1 size:3 b_cnt:8 max_b_cnt:8map2.insert(pair<WWID, ShoeSize>(3, 8)); // bs(0):2 bs(1):1 bs(2):1 size:4 b_cnt:8 max_b_cnt:8

(1, 6)

(2, 7)

(3, 8)Due to the implementation of the MyHash::operator() we will only insert elements in the first 3 buckets, even though the bucket_count of the map is 8

bu

cket_

cou

nt(

)

Page 67: C++ Advanced

page 67Advanced C++; Gyuszi Suto, Intel 04/11/23

unordered_map

0 (0, 5)

1

2

3

4

5

6

7

struct MyHash2 { // return values 0 thru 8 size_t operator() (const WWID & key) const { return key.wwid % 9; } };MyHash2 myhash2;unordered_map<WWID, ShoeSize, MyHash2> map2(3, myhash2); // need 3 buckets, use myhash2map2.insert(pair<WWID, ShoeSize>(WWID(0), ShoeSize(5))); // bs(0):1 bs(1):0 bs(2):0 bs(7):0 size:1map2.insert(make_pair(WWID(1), ShoeSize(6))); // bs(0):1 bs(1):1 bs(2):0 bs(7):0 size:2map2.insert(pair<WWID, ShoeSize>(2, 7)); // bs(0):1 bs(1):1 bs(2):1 bs(7):0 size:3map2.insert(pair<WWID, ShoeSize>(7, 12)); // bs(0):1 bs(1):1 bs(2):1 bs(7):1 size:4 map2.insert(pair<WWID, ShoeSize>(8, 13)); // bs(0):2 bs(1):1 bs(2):1 bs(7):1 size:5

(1, 6)

(2, 7)

(8, 13)

bu

cket_

cou

nt(

)

trying to insert (8, 13) in bucket 8 - unsuccessfully, inserted here instead

the hash function returns 8 for (8, 13) but the value is truncated to limit it to buckets 0 thru 7

value &= 0x07; // done automatically for you // having the bucket_count at power of 2 // comes in handy here

(7, 12)

8

Page 68: C++ Advanced

page 68Advanced C++; Gyuszi Suto, Intel 04/11/23

unordered_map

0 (0, 5)

1

2

3

4

5

6

7

cout << map2.load_factor(); // average load factor for a bucket; prints .625 (size()/bucket_count())cout << map2.max_load_factor(); // prints 4map2.max_load_factor(0.5); // sets new target load factormap2.rehash(8); // rehash such that the load factor does not exceed target load factor, add new buckets if neededmap2.max_load_factor(4); // sets target load factor back to 4map2.rehash(8); // rehash, get back to original state, get at least 8 buckets

(1, 6)

(2, 7)

(8, 13)

bu

cket_

cou

nt(

) =

= 8

(7, 12)

0 (0, 5)

1

2

3

4

5

6

7

(1, 6)

(2, 7)

bu

cket_

cou

nt(

) =

= 1

6

(7, 12)

8 (8, 13)

15

max_load_factor(0.5)rehash(8)

load_factor() == 0.625 (5/8) load_factor() == 0.3125 (5/16)

max_load_factor(4)rehash(8)

Page 69: C++ Advanced

page 69Advanced C++; Gyuszi Suto, Intel 04/11/23

unordered_map

0 (0, 5)

1

2

3

4

5

6

7

struct MyHash { // return values 0, 1 or 2 size_t operator() (const WWID & key) const { return key.wwid % 3; } } myhash;unordered_map<WWID, ShoeSize, MyHash> map2(3, myhash);// insert the same 5 elements into map2cout << map2.load_factor(); // average load factor for a bucket; prints .625 (size()/bucket_count())cout << map2.max_load_factor(); // prints 4map2.max_load_factor(0.5); // sets new target load factormap2.rehash(8); // rehash such that the load factor does not exceed target load factor, add new buckets if needed

(1, 6)

(2, 7) (8, 13)

bu

cket_

cou

nt(

) =

= 8

(7, 12)

rehash(8)

load_factor() == 0.625 (5/8)

0 (0, 5)

1

2

3

4

5

6

7

(1, 6)

(2, 7) (8, 13)

bu

cket_

cou

nt(

) =

= 1

6

(7, 12)

load_factor() == 0.3125 (5/16)

15

elements are not moved to different buckets because the hash function returns only values 0, 1, 2 we did get a lower load

factor but it’s not useful, all elements will be in buckets 0,1,2

Page 70: C++ Advanced

page 70Advanced C++; Gyuszi Suto, Intel 04/11/23

unordered_map

(0, 5)

(1, 6)(2, 7) (8, 13) (7, 12)

(12, 7)

(0, 5)(1, 6)(2, 7) (8, 13) (7, 12) (12, 7)

map2.begin()

internally implemented as one big list

(14, 6) (11, 17)

0

1

2

3 (33, 5)

(14, 6) (11, 17) (33, 5)

map2.end()

map2.begin(0) map2.begin(1)

map2.end(0)

map2.end(2)

map2.end(1)

map2.begin(2)

map2.begin(3)

map2.end(3)

0

1

2

3

unordered_map<WWID, ShoeSize, MyHash>::iterator

unordered_map<WWID, ShoeSize, MyHash>::local_iterator

Page 71: C++ Advanced

page 71Advanced C++; Gyuszi Suto, Intel 04/11/23

unordered_map

0 (0, 5)

1

2

struct MyHash { // return values 0, 1 or 2 size_t operator() (const WWID & key) const { return key.wwid / 10; } } myhash;unordered_map<WWID, ShoeSize, MyHash> map2(8, myhash);// insert (12,7), (0,5), (7,12), (1, 6), (8,13), (2,7)map2.insert(make_pair(WWID(2), ShoeSize(7))); // attempting to re-insert (2,7)

(1, 6)(2, 7) (8, 13) (7, 12)

(12, 7)

0

(0, 5)

1

2

(1, 6)(2, 7) (8, 13) (7, 12) (12, 7)

_list.begin() _list.end()

internally implemented as one list; when we try to insert (2,7) for the second time, it searches backward within the list-portion of bucket 0

searches this portion of list

the insert attempt returns an iterator pointing to original (2,7) paired with a bool false denoting the fact that the element was already in the map

Page 72: C++ Advanced

page 72Advanced C++; Gyuszi Suto, Intel 04/11/23

unordered_map

0 (0, 5)

1

2

// returns an iterator to element (8, 13)unordered_map<WWID, ShoeSize, MyHash>::iterator mit = map2.find(8);

// passing mit as hint iterator to speed up insertionmap2.insert(mit, make_pair(WWID(1), ShoeSize(7)));

(1, 6)(2, 7) (8, 13) (7, 12)

(12, 7)

mit

backit forwitcompiler dependent usage of hint iterator mitcould do 2 finger approach to search away from mit(Dinkumware didn’t use the hint though..)

will find already existing element (1, 6) and skip re-insertingreturns pair (result, false) – the keys of (1, 6) and (1, 7) match, insert will look for key 1, will find (1, 6) and not insert (1,7) over (1,6)

result

Page 73: C++ Advanced

page 73Advanced C++; Gyuszi Suto, Intel 04/11/23

unordered_multiset

0

1

2

unordered_multiset<WWID, MyHash> set1;set1.insert(2);set1.insert(3);set1.insert(2);set1.insert(3);set1.insert(11);

unordered_multiset<WWID, MyHash>::iterator i = set1.find(2); // find one element with matching value

// return an iterator pair ranging all the elements with matching valuespair<unordered_multiset<WWID, MyHash>::iterator, unordered_multiset<WWID, MyHash>::iterator> itpair = set1.equal_range(3);

for(unordered_multiset<WWID, MyHash>::iterator i1 = itpair.first; i1 != itpair.second; ++i1){ cout << "i1 " << *i1 << endl;} set1.insert(itpair.first, 3); // inserting elem 3 for the 3rd time, passing a hint iterator to speed up insertioncout << set1.count(3); // prints 3

3

11

3 2 2

i

itpair.first itpair.second

++ ++

3

Page 74: C++ Advanced

page 74Advanced C++; Gyuszi Suto, Intel 04/11/23

time complexity of unordered associative containers

X a(n, hash_fun); // construct a container with at least n buckets, O(n)X a(first, last, n, hash_fun); // construct a cointainer with at least n buckets, then insert elements in the // iterator range into the container; average O(N), worst O(N^2) - N is the number of elements inserteda.insert(object); // average O(1), worst O(a.size())a.insert(hint_iterator, object); // average O(1), worst O(a.size())a.erase(key); // erases element with matching key; average O(a.count(key)), worst O(a.size())a.erase(it1, it2); // erases elements between the 2 iterators; average O(distance(it1, it2)), worst O(a.size())a.clear(); // removes all elements; O(a.size())

a.find(k); // find element with matching key; average O(1), worst O(a.size())a.count(k); // count elements with maching key; average O(1), worst O(a.size())a.equal_range(k); // return a pair of iterators spanning the range with matching key; average O(a.count()), worst O(a.size())a.bucket_size(n); // O(M) where M is the number of objects in bucket n

the ordered associative containers set, multiset, map, multimap use a red-black tree for their implementationtheir insertion and query times are O(log(N))

the unordered associative containers have average times of insertion and query of O(1) – but it needs careful design and babysitting – or else they can deteriorate to O(n)

Page 75: C++ Advanced

page 75Advanced C++; Gyuszi Suto, Intel 04/11/23

mem_fn (member function) Function Template// File t1.cc#include <iostream.h>

void g(int i){ cout << i ;}

class XYZ{public: int i; void luba(int ii); // non-static member function};void XYZ::luba(int ii){ i = ii; cout << this << “,” << i << endl;}

/* File t2.c */#include <stdio.h>/* declare a mangled C++ global function */void g__Fi(int);/* declare a mangled C++ member function */void luba__3XYZFi(void *, int);

void f(void){ int j = 66; g__Fi(j); /* will call g() from t1.cc */

void * vp = malloc(100);

/* will call void XYZ::luba(int); */ luba__3XYZFi(vp, j); /* passing vp to */ /* become this pointer ! */}

// nm t1.o will report:// Name value scope type subspace// g__Fi | 0| extern| entry| $CODE$// luba__3XYZFi| 0| extern| entry| $CODE$

/* nm t2.o will report *//* f | 0| extern| entry| $CODE$ *//* g__Fi | | undef| code| *//* luba__3XYZFi| | undef| code| */

C++ file

C file

Example of calling C++ member function from C code.Educational purpose only. Do not do this at home!

Page 76: C++ Advanced

page 76Advanced C++; Gyuszi Suto, Intel 04/11/23

mem_fn (member function) Function Template

class AA{ int i;public: AA(int j = 0) : i(j) {} static int f0(double d) { return static_cast<int>(d);} int f1(double d) const {return static_cast<int>(d * i);} const AA operator + (const AA & rs) const { AA retval; retval.i = i + rs.i; return retval; } int geti(void) const { return i; }};

int foo(const AA & a) { return a.geti();}

void mem_fn_test(void){ int (* fooptr) (const AA &) = foo; // just a pointer to a global function int (AA:: * f1ptr)(double)const = &AA::f1; // pointer to a member function of AA // pointer to a member operator of AA const AA (AA:: * operatorPlusPtr)(const AA &) const = &AA::operator +; int (* f0ptr)(double) = &AA::f0; // pointer to static member function AA a1(13); AA * a2 = new AA(14); shared_ptr<AA> a3(new AA(15)); weak_ptr<AA> a4 = a3;

// continued on next page

Page 77: C++ Advanced

page 77Advanced C++; Gyuszi Suto, Intel 04/11/23

mem_fn (member function) Function Template

// continued from prev page mem_fn(f1ptr)(a1, 9.9); // equivalent to a1.f1(9.9); mem_fn(f1ptr)(a2, 11.3); // equivalent to a2->f1(11.3); - notice that pointers and non-pointers mem_fn(f1ptr)(a3, 13.5); // equivalent to a3->f1(13.5); - are handled with same syntax // mem_fn(f1ptr)(a4, 15.1); // equivalent to a4->f1(15.1); // doesnt' compile

(f0ptr)(8.8); // ok – calling a static member function, does not need an object address for this (f1ptr)(8.8); // error - term does not evaluate to a function taking 1 argument

AA a5 = mem_fn(operatorPlusPtr)(a2, a1); // equivalent to a5 = (*a2) + a1 AA a6 = mem_fn(operatorPlusPtr)(a1, *a2); // equivalent to a6 = a1 + (*a2) AA a7 = mem_fn(operatorPlusPtr)(a2, *a3); // equivalent to a7 = (*a2) + (*a3) AA a8 = mem_fn(operatorPlusPtr)(a3, *a2); // equivalent to a8 = (*a3) + (*a2)}

Page 78: C++ Advanced

page 78Advanced C++; Gyuszi Suto, Intel 04/11/23

The reference_wrapper Class Template

class RR{ // sizeof(RR) == 4public: int & ir; RR(int & j) : ir(j) {}};

int b; // some global int

class PP{ // sizeof(PP) == 4public: reference_wrapper<int> irw; PP(int & k) : irw(k) { irw = b; // error }};

void test_reference_wrapper(void){ int ii(11), jj(12); RR r1(ii), r2(jj); r1 = r2; // error, don’t know how to copy references

PP p1(ii), p2(jj); cout << p1.irw.get(); // prints 11 p1 = p2; cout << p1.irw.get(); // prints 12}

Page 79: C++ Advanced

page 79Advanced C++; Gyuszi Suto, Intel 04/11/23

The reference_wrapper Class Template

class AA{ int i;public: AA(int j = 0) : i(j) {} static int f0(double d) { return static_cast<int>(d);} int f1(double d) const {return static_cast<int>(d * i);} const AA operator + (const AA & rs) const { AA retval; retval.i = i + rs.i; return retval; } int geti(void) const { return i; }};

int foo(const AA & a) { return a.geti();}int bar(const AA & a) { return a.geti() + 2; }

void test_reference_wrapper2(void){ typedef int (* const myfuntype) (const AA &); reference_wrapper<myfuntype> fooref(&foo); // creating a reference wrapper to foo() reference_wrapper<int (* const) (const AA &)> barref(&bar); // ref wrapper to bar()

AA a1(12); fooref(a1); // calling foo barref(a1); // calling bar

/* supposed to work but doesn't fooref = cref(bar); fooref(a1); // calling bar */ reference_wrapper<int (AA:: * const)(double)const> memref(&AA::f1); mem_fn(memref.get())(a1, 8.7); // calling a member function via the reference wrapper}

Page 80: C++ Advanced

page 80Advanced C++; Gyuszi Suto, Intel 04/11/23

The function Class Template

class AA{ int i;public: AA(int j = 0) : i(j) {}};

int foo(const AA & a) { cout << "foo" << endl; return a.geti();}int bar(const AA & a) { cout << "bar" << endl; return a.geti();}

void duba(void) { cout << "in dubai\n"; }

void test_function(void){ function<void ()> fd(duba); // constructing a function object fd(); // call duba

typedef int (myfuntype) (const AA &); // typedef of a function (not a function pointer) function<myfuntype> fun1; // defining empty function AA a1(12); fun1 = foo; // assigning a global function to fun1 fun1(a1); // calling fun1 with argument a1 fun1 = bar; // fun1(a1); // calling bar }

Page 81: C++ Advanced

page 81Advanced C++; Gyuszi Suto, Intel 04/11/23

The function Class Templateclass YY{public: int _y; YY(int y) : _y(y) {} int operator () () const { cout << "In YY::op(), this: " << hex << this << dec << "\n"; return 0; }};

int yme(){ cout << "In yme" << endl; return 5;}

void fxx(void) {}

void test_function(void){ YY y1(13); // one object function<int ()> fun2; // function object returning int, taking no arguments fun2 = y1; // copy of y1 made fun2(); // call y1.op() – operates on a copy of y1 – 13 y1._y = 55; // modifying y1 fun2(); // still operates on the original copy of y1 – 13 (not 55 !) cout << "sizeof fun2 " << sizeof(fun2) << endl; // prints 4 fun2 = YY(33); cout << (bool)(fun2.target<YY>()) << endl; // true cout << (bool)(fun2.target<int ()>()) << endl; // false

fun2(); // calling YY::op() on another object cout << "fun2.target_type().name() " << fun2.target_type().name() << endl; // prints class YY // fun2.target_type() returns a reference to type_info object

fun2 = yme; fun2(); // calling yme cout << "fun2.target_type().name() " << fun2.target_type().name() << endl; // prints int (__cdecl *)(void) cout << "fun2.target<YY> " << (bool)(fun2.target<YY>()) << endl; // false cout << "fun2.target<yme> " << (bool)(fun2.target<int ()>()) << endl; // true ?? why not??

//fun2 = fxx; // error, non-matching type}

Page 82: C++ Advanced

page 82Advanced C++; Gyuszi Suto, Intel 04/11/23

The function Class Template

void test_function(void){ YY y3(33), y4(44); function<int ()> fun3, fun4; cout << "about to fun3 = y3" << endl; fun3 = y3; // y3 copied by value cout << "about to fun4 = y4" << endl; fun4 = y4; // y4 copied by value

cout << "calling fun3()" << endl; fun3(); cout << "calling fun4()" << endl; fun4();

cout << "fun3 = 0" << endl; fun3 = 0; // setting fun3 to zero

fun3.swap(fun4); // swapping the 2 functions, fun3 holds copy of y4, fun4 is zero if(fun3) {cout << "calling fun3()" << endl; fun3(); } // executed on copy of y4 if(fun4) {cout << "calling fun4()" << endl; fun4();} // if(fun4) evaluates to false, no call

}

Page 83: C++ Advanced

page 83Advanced C++; Gyuszi Suto, Intel 04/11/23

The bind Function Template

// Recap std:: bind1st, bind2nd

// creating a predicateclass myLessThan : public binary_function<int, int, bool>{public: bool operator () (int a, int b) const { return a < b; }};

bool myLessThan3(int a){ return a < 3;}

void test_std_bind(){ array<int, 4> arr = {1, 2, 3, 4}; int k; count_if(arr.begin(), arr.end(), myLessThan3, k); // sets k to 2 count_if(arr.begin(), arr.end(), bind2nd(myLessThan(), 3), k); // (*it < 3) sets k to 2 count_if(arr.begin(), arr.end(), bind1st(myLessThan(), 3), k); // (3 < *it) sets k to 1}

Page 84: C++ Advanced

page 84Advanced C++; Gyuszi Suto, Intel 04/11/23

The bind Function Template

bool myLessThan3(int a){ return a < 3;}

bool myLessThanF(int a, int b){ return a < b;}using namespace std::tr1::placeholders; // for _1, _2, etc

void test_bind(){ bind(myLessThan3, _1)(4); // 4 < 3 false bind(myLessThan3, 4)(); // 4 < 3 false bind(myLessThan3, 2)(); // 2 < 3 true bind(myLessThanF, 4, 5)(); // myLessThanF(4, 5) returns true bind(myLessThanF, 4, _1)(5); // myLessThanF(4, 5) returns true bind(myLessThanF, _1, _2)(4, 5); // myLessThanF(4, 5) returns true bind(myLessThanF, _2, _1)(4, 5); // myLessThanF(5, 4) returns false bind(myLessThanF, _1, 5)(4); // myLessThanF(4, 5) returns true

// using runtime values int val1; cout << "val1: "; cin >> val1; int val2; cout << "\tval2: "; cin >> val2; b = bind(myLessThanF, val1, _1)(val2); // will perform myLessThanF(val1, val2) }

Page 85: C++ Advanced

page 85Advanced C++; Gyuszi Suto, Intel 04/11/23

The bind Function Template

bool myLessThanF(int a, int b){ return a < b;}

void test_bind(){ array<int, 4> arr = {1, 2, 3, 4}; int k = 0; count_if(arr.begin(), arr.end(), bind(myLessThanF, 3, _1), k); // k set to 1

// 3<1f 3<2f 3<3f 3<4t // 1 2 3 4

count_if(arr.begin(), arr.end(), bind(myLessThanF, _1, 3), k); // k set to 2

// 1<3t 2<3t 3<3f 4<3f // 1 2 3 4

}

Page 86: C++ Advanced

page 86Advanced C++; Gyuszi Suto, Intel 04/11/23

The bind Function Template

bool inBetweenF(int l, int m, int r){ return (l <= m && m <= r);}

void test_bind(){ inBetweenF(7, 8, 9); // call (7, 8, 9) -> true bind(inBetweenF, _1, _2, _3)(7, 8, 9); // call (7, 8, 9) -> true bind(inBetweenF, _1, _3, _2)(7, 8, 9); // call (7, 9, 8) -> false bind(inBetweenF, 7, _1, _2) (8, 9); // call (7, 8, 9) -> true bind(inBetweenF, 7, 7, _1) (8); // call (7, 7, 8) -> true bind(inBetweenF, 7, _1, 8) (7); // call (7, 7, 8) -> true bind(inBetweenF, _1, 7, 9) (6); // call (6, 7, 9) -> true bind(inBetweenF, _1, _1, _1)(10, 11, 12); // call (10, 10, 10) -> true bind(inBetweenF, _3, _4, _4)(11, 22, 33, 44); // call (33, 44, 44) -> true}

notice 4 arguments

Page 87: C++ Advanced

page 87Advanced C++; Gyuszi Suto, Intel 04/11/23

The bind Function Template

bool inBetweenF(int l, int m, int r){ return (l <= m && m <= r);}

void test_bind(){ array<int, 4> arr = {1, 2, 3, 4}; int k = 0; count_if(arr.begin(), arr.end(), bind(inBetweenF, 1, 2, _1), k); // k set to 3

// (1,2,1)f (1,2,2)t (1,2,3)t (1,2,4)t // 1 2 3 4

count_if(arr.begin(), arr.end(), bind(inBetweenF, 1, _1, 4), k); // k set to 4

// (1,1,4)t (1,2,4)t (1,3,4)t (1,4,4)t // 1 2 3 4

count_if(arr.begin(), arr.end(), bind(inBetweenF, 2, _1, 3), k); // k set to 2

// (2,1,3)f (2,2,3)t (2,3,3)t (2,4,3)f // 1 2 3 4

count_if(arr.begin(), arr.end(), bind(inBetweenF, _1, 2, 4), k); // k set to 2

// (1,2,4)t (2,2,4)t (3,2,4)f (4,2,4)f // 1 2 3 4}

Page 88: C++ Advanced

page 88Advanced C++; Gyuszi Suto, Intel 04/11/23

The bind Function Templateclass WW{public: int w_; WW(int w = 0) : w_(w) {} bool amIinBetween(int l, int r) { return (l <= w_) && (w_ <= r); } int operator () (int i) const { return i; } typedef int result_type;};

class TT{public: int t_; TT(int t = 0) : t_(t) {} int foo(int i, int j) const { return i + j + t_; }};

void test_bind(void){ array<WW, 3> war = {1, 2, 3}; int k = 0; count_if(war.begin(), war.end(), bind(&WW::amIinBetween, _1, 2, 4), k); // k set to 2

// testing bind chaining WW w1(11), w2(33); w1(3); // call w1.op()(3) TT t1(13); bind(w1, 22)(); // equivalent to w1(22) bind(w2, bind(w1, 7)())(); // eq to w2(w1(7))

bind(&TT::foo, t1, 5, 6)(); // calls t1.foo(5, 6) bind(&TT::foo, _3, _2, _1)(5, 6, t1); // calls t1.foo(6, 5)

// supposed to work but doesn't compile //bind(&TT::foo, t1, h, bind(w1, _1))(2); //bind(&TT::foo, t1, bind(w1, _2), bind(w2, _1))(27, 37); // call t1.foo(w1(37), w2(27))}

binding a non-static member function of class WW to fixed arguments 2 and 4the first argument _1 is the object of type WW the member function is called on

Page 89: C++ Advanced

page 89Advanced C++; Gyuszi Suto, Intel 04/11/23

The bind Function Template

bool inBetweenF(int l, int m, int r){ return (l <= m && m <= r); }

bool matchDelta(int l, int m, int r){ return (m == abs(l - r)); }

void test_function_bind(){ array<int, 3> r0 = { 1, 2, 3}, r1 = {11, 2, 1}; array< array<int, 3>*, 2> a2d = {&r0, &r1};

array<function<bool (int, int, int)>, 2> af; // 2 element array of function objects (empty)

af[0] = inBetweenF; af[1] = matchDelta;

for(int t = 0; t < 2; ++t){ // iterate twice for(int row = 0; row < 2; ++row){ // iterate on each row int k; count_if(a2d[row]->begin(), a2d[row]->end(), bind(af[row], 2, _1, 4), k); } af[0].swap(af[1]); }}

1 2 3

11 2 1

a2d

inBetweenF

matchDelta

2,1,4 2,2,4 2,3,4

2,1,4 2,2,4 2,3,4

inBetweenF

matchDelta

2,11,4 2,2,4 2,1,4

2,11,4 2,2,4 2,1,4

Page 90: C++ Advanced

page 90Advanced C++; Gyuszi Suto, Intel 04/11/23

The bind Function Template

function<bool (int, float, char, double)> f4p;function<bool (int, float, char)> f3p;function<bool (int, float)> f2p;function<bool (int)> f1p;f4p = fooo4;f4p(3, 5.5, 'w', 9.9);f3p = fooo3;f3p(3, 5.5, 'w');f2p = fooo2;f2p(3, 5.5);f1p = fooo1;f1p(3);bind(f4p, _1, 7.7, _2, 9.9)(5, 'w');bind(f4p, 5, 7.7, _2, _1)(9.8, 'w');f3p = bind(f4p, _1, _2, _3, 11.11);f3p(3, 5.5, 'w');f2p = bind(f4p, _1, _2, 'w', 13.13);f2p(5, 6.6);f2p = bind(f3p, _1, _2, 'w');f2p(7, 8.8);f1p = bind(f4p, _1, 7.7, 'w', 9.9);f1p(5);f1p = bind(f3p, _1, 7.7, 'w');f1p(5);f1p = bind(f2p, _1, 7.7);f1p(5);function <bool (double)> f1pd;f1pd = bind(f4p, 5, 7.7, 'w', _1);f1pd(9.9);

bool fooo4(int i, float j, char c, double d) {/* */}bool fooo3(int i, float j, char c) {/* */}bool fooo2(int i, float) {/* */}bool fooo1(int i) {/* */}

bool fooo4( int i, float j, char c, double d)

bind(f4p, _1, 7.7, _2, 9.9) (5, 'w');_1 _2

i j c d

Page 91: C++ Advanced

page 91Advanced C++; Gyuszi Suto, Intel 04/11/23

Peformance C++ : inlining

C++ code tends to have many many small functions (much more than C) constructors, destructors, operators, member access functions, etc.Function call overhead may dominate all your runtime without being aware of itInlining is the first and most important runtime optimization you need to consider!

class X{public: X() {} // default constructor X(int i) {} // constructor taking one int as argument ~X() {} // destructor X & operator = (const X & rs) {} // assignment bool operator == (const X & rs) const {} // comparison operator bool operator != (const X & rs) const {} // comparison operator X & operator ++ ( ) {} // pre-increment X operator ++ (int) {} // post-increment // …};

int main(void){ for(X x1(0); x1 != X(100); x1++){ X temp[10]; // 1ct + 100*(1comp + 1ct + 1dt + 1ct + 1dt + 1post_incr + //… // 10ct + 1ass + 10dt) = 2701 function calls temp[i] = x1; } return 0;}

For this example, if you don’t have good inlining, your code will be very slow!

Page 92: C++ Advanced

page 92Advanced C++; Gyuszi Suto, Intel 04/11/23

inlining – case study1// declarationint myMax(int a, int b);

// definitionint myMax(int a, int b) { return a > b ? a : b;}

long foo(int v1, int v2){ long k = 0; for(int i = 0; i < v1; ++i) for(int j = 0; j < v2; ++j) k += myMax(i, j); return k;

} int main(void){ foo(500000, 10000); return 0;}

.globl _Z5myMaxii.type _Z5myMaxii, @function

_Z5myMaxii:.LFB2:

.file 1 "t255.cc"

.loc 1 8 0pushq %rbp

.LCFI0:movq %rsp, %rbp

.LCFI1:movl %edi, -4(%rbp)movl %esi, -8(%rbp).loc 1 10 0movl -4(%rbp), %eaxcmpl -8(%rbp), %eaxjle .L2movl -4(%rbp), %eaxmovl %eax, -12(%rbp)jmp .L3

.L2:movl -8(%rbp), %eaxmovl %eax, -12(%rbp)

.L3:movl -12(%rbp), %eax.loc 1 11 0leaveret

g++ m.cc t255.ccelapsed time: 32.85

0000000000000032 T _Z3fooii0000000000000000 T _Z5myMaxii

.globl _Z3fooii.type _Z3fooii, @function

_Z3fooii:.LFB3:

.loc 1 14 0pushq %rbp

.LCFI2:movq %rsp, %rbp

.LCFI3:subq $24, %rsp

.LCFI4:movl %edi, -20(%rbp)movl %esi, -24(%rbp)

.LBB2:.loc 1 16 0movq $0, -16(%rbp)

.LBB3:.loc 1 17 0movl $0, -8(%rbp)jmp .L6

.L9:

.LBB4:.loc 1 18 0movl $0, -4(%rbp)jmp .L7

.L8:.loc 1 19 0movl -4(%rbp), %esimovl -8(%rbp), %edicall _Z5myMaxiicltqaddq %rax, -16(%rbp).loc 1 18 0addl $1, -4(%rbp)

.L7:movl -4(%rbp), %eaxcmpl -24(%rbp), %eaxjl .L8

.LBE4:.loc 1 17 0addl $1, -8(%rbp)

.L6:movl -8(%rbp), %eaxcmpl -20(%rbp), %eaxjl .L9

.LBE3:.loc 1 20 0movq -16(%rbp), %rax

.LBE2:.loc 1 22 0leaveret

myMax function body

call to myMax

Page 93: C++ Advanced

page 93Advanced C++; Gyuszi Suto, Intel 04/11/23

inlining – case study1// declarationinline int myMax(int a, int b);

// definitionint myMax(int a, int b) { return a > b ? a : b;}

long foo(int v1, int v2){ long k = 0; for(int i = 0; i < v1; ++i) for(int j = 0; j < v2; ++j) k += myMax(i, j); return k;

} int main(void){ foo(500000, 10000); return 0;}

.globl _Z5myMaxii.type _Z5myMaxii, @function

_Z5myMaxii:.LFB2:

.file 1 "t255.cc"

.loc 1 8 0pushq %rbp

.LCFI0:movq %rsp, %rbp

.LCFI1:movl %edi, -4(%rbp)movl %esi, -8(%rbp).loc 1 10 0movl -4(%rbp), %eaxcmpl -8(%rbp), %eaxjle .L2movl -4(%rbp), %eaxmovl %eax, -12(%rbp)jmp .L3

.L2:movl -8(%rbp), %eaxmovl %eax, -12(%rbp)

.L3:movl -12(%rbp), %eax.loc 1 11 0leaveret

g++ m.cc t255.ccelapsed time: 22.94

0000000000000020 T _Z3fooii0000000000000000 T _Z5myMaxii

_Z3fooii:.LFB3:

.loc 1 14 0pushq %rbp

.LCFI2:movq %rsp, %rbp

.LCFI3:movl %edi, -36(%rbp)movl %esi, -40(%rbp)

.LBB7:.loc 1 16 0movq $0, -24(%rbp)

.LBB8:.loc 1 17 0movl $0, -16(%rbp)jmp .L6

.L11:

.LBB9:.loc 1 18 0movl $0, -12(%rbp)jmp .L7

.L10:movl -16(%rbp), %eaxmovl %eax, -4(%rbp)movl -12(%rbp), %eaxmovl %eax, -8(%rbp)

.LBB10:

.LBB11:.loc 1 10 0movl -4(%rbp), %eaxcmpl -8(%rbp), %eaxjle .L8movl -4(%rbp), %eaxmovl %eax, -44(%rbp)jmp .L9

.L8:movl -8(%rbp), %eaxmovl %eax, -44(%rbp)

.L9:movl -44(%rbp), %eax

.LBE11:

.LBE10:.loc 1 19 0cltqaddq %rax, -24(%rbp).loc 1 18 0addl $1, -12(%rbp)

.L7:movl -12(%rbp), %eaxcmpl -40(%rbp), %eaxjl .L10

.LBE9:.loc 1 17 0addl $1, -16(%rbp)

.L6:movl -16(%rbp), %eaxcmpl -36(%rbp), %eaxjl .L11

.LBE8:.loc 1 20 0movq -24(%rbp), %rax

.LBE7:.loc 1 22 0leaveret

myMax function body

body of myMax inserted into codeno more call to myMax

Page 94: C++ Advanced

page 94Advanced C++; Gyuszi Suto, Intel 04/11/23

inlining – case study1// declarationinline int myMax(int a, int b);

// definitionint myMax(int a, int b) { return a > b ? a : b;}

long foo(int v1, int v2){ long k = 0; for(int i = 0; i < v1; ++i) for(int j = 0; j < v2; ++j) k += myMax(i, j); return k;

} int main(void){ foo(500000, 10000); return 0;}

g++ -O2 m.cc t255.ccelapsed time: 6.26

0000000000000000 T _Z3fooi

.globl _Z3fooii.type _Z3fooii, @function

_Z3fooii:.LFB3:

.file 1 "t255.cc"

.loc 1 14 0.LVL0:.LBB7:.LBB8:.LBB9:

.loc 1 18 0xorl %r8d, %r8d

.LVL1:

.LBE9:.loc 1 17 0xorl %ecx, %ecx

.LVL2:testl %edi, %edijle .L5

.LVL3:.p2align 4,,10.p2align 3

.L3:xorl %edx, %edx

.LVL4:

.LBB10:.loc 1 18 0testl %esi, %esijle .L7.p2align 4,,10.p2align 3

.L9:.loc 1 19 0cmpl %ecx, %edxmovl %ecx, %eaxcmovge %edx, %eax.loc 1 18 0addl $1, %edx

.LVL5:.loc 1 19 0cltqaddq %rax, %r8.loc 1 18 0cmpl %edx, %esijg .L9

.L7:

.LBE10:.loc 1 17 0addl $1, %ecx

.LVL6:cmpl %ecx, %edijg .L3

.L5:

.LBE8:

.LBE7:.loc 1 22 0movq %r8, %raxret

body of myMax inserted into codeno more call to myMax

myMax function doesn’t even show up in the symbol table

Page 95: C++ Advanced

page 95Advanced C++; Gyuszi Suto, Intel 04/11/23

inline – comparing various options

-O2 No No No No Yes Yes

inline keyword No Yes No No Yes Yes

--no-inline compiler option

No No No Yes Yes No

__attribute__ ((always_inline)) compiler directive

No No Yes Yes No No

did compiler inline the function?

No No Yes Yes No Yes

does the object file contain the function?

Yes Yes Yes Yes Yes No

Runtime 32 32 22 22 11 6

Page 96: C++ Advanced

page 96Advanced C++; Gyuszi Suto, Intel 04/11/23

inlining – case study2// t.h#define II inline//#define II

class X{private: int i_;public: X(int i); ~X(); bool operator > (const X & rs) const; X & operator = (const X & rs); X & operator ++ (void); int i(void) const;};

II X::X(int i) : i_(i) {}II X::~X() {}II bool X::operator > (const X & rs) const { return i_ > rs.i_; }II int X::i(void) const { return i_; }II X & X::operator ++(void) { ++i_; return * this; }

// declarationII int myMax(X a, X b);

// definitionint myMax(X a, X b) { return a > b ? a .i() : b.i();}

#ifdef II #undef II#endif

g++ -O2 m.cc t.cc

nm t.o0000000000000000 T _Z3foo1XS_

runtime 2.3

// t.cc#include "t.h"

long foo(X v1, X v2){ long k = 0; for(X i = 0; !(i > v1); ++i) for(X j = 0; !(j > v2); ++j) k += myMax(i, j); return k;

}

// m.cc#include "t.h"

long foo(X v1, X v2);

int main(void){ long res = foo(X(200000), X(10000)); return 0;}

g++ m.cc t.cc

nm t.o00000000000000cd T _Z3foo1XS_000000000000008d T _Z5myMax1XS_0000000000000016 T _ZN1XC1Ei0000000000000000 T _ZN1XC2Ei0000000000000036 T _ZN1XD1Ev000000000000002c T _ZN1XD2Ev0000000000000070 T _ZN1XppEv0000000000000060 T _ZNK1X1iEv0000000000000040 T _ZNK1XgtERKS

runtime 37.46

16x speedup!

Page 97: C++ Advanced

page 97Advanced C++; Gyuszi Suto, Intel 04/11/23

inlining – case study 3

class Rectangle{public: int xl, yl, xh, yh;};

void foo(Rectangle r1){ r1.xl = r1.yh;}

.type _Z3foo9Rectangle, @function_Z3foo9Rectangle:.LFB2: pushq %rbp.LCFI0: movq %rsp, %rbp.LCFI1: movq %rdi, -16(%rbp) movq %rsi, -8(%rbp) movl -4(%rbp), %eax movl %eax, -16(%rbp) leave ret

stack pointer

yh

Return Address0

-4

xh-8

yl-12

xl-16

The assignment in question is executed in 2 assembly instructionsVery fast! As fast as C.

eax register

Page 98: C++ Advanced

page 98Advanced C++; Gyuszi Suto, Intel 04/11/23

inlining – case study 3class D1D{private: int v_; // 0 - LOW, 1 - HIGHpublic: D1D(int i) : v_(i) {} operator int () { return v_;}};

class D2D{private: int v_; // 0 WEST, 1 EAST, 2 SOUTH, 3 NORTHpublic: D2D(int v) : v_(v) {} operator D1D () { return D1D(v_ & 1); } bool isVertical() { return (v_ >> 1); } operator int (void) { return v_;}};

const D1D LOW(0), HIGH(1);const D2D WEST(0), EAST(1), SOUTH(2), NORTH(3);

class Interval{private: int v_[2];public: int & get(D1D d) { return v_[d]; } Interval(int l, int h) { v_[0] = l; v_[1] = h; }};

class Rectangle{private: Interval i_[2];public: int & get(D2D d) { return i_[d.isVertical()].get(d); }};

void foo(Rectangle r1){ r1.get(WEST) = r1.get(NORTH);}

isVerticalInterval::get

D1D::op int()D2D::op D1D()

The foo () function can be made almost as fast as the one in the previous page, provided that all the functions inline.Otherwise, could be much slower!

Page 99: C++ Advanced

page 99Advanced C++; Gyuszi Suto, Intel 04/11/23

inlining – migration

f1

f2

10

f3

10

f4

4

f5

4

f6

6

small

small

small

small

big

big

f inlined function

f out of line function

problem(4 calls)

f1

f2

10

f3

10

f4

4

f5

4

f6

6

small

small

small

small

big

big

problem(10 calls)

// force f4 to be inlinedinline void f4() __attribute__ ((always_inline))

the compiler was forced to inline f4, but in the process it gave up on trying to inline f3, so the code actually became slower!

Page 100: C++ Advanced

page 100Advanced C++; Gyuszi Suto, Intel 04/11/23

inlining - summary

• C++ tends to use many small functions• C++ function inlining is critical to runtime• inlining can make a huge runtime difference, up to 10x (depending on program and code writing style)• inlining is a black magic, depeding on many factors

– code style– size of file– size of functions– what’s in the functions (loops, calls, recursion, etc)– where the function definition is placed (inside the class, before/after call)– inline keyword – __attribute__ ((always_inline)) compiler directive– version of compiler – re-asses when changing compiler version– --no-inline compiler option– -finline-limit=nn compiler option – depends on nn, experiment with various values

• even if you tried everything you could, the compiler may not inline• inlining may happen in some calls, but not all• if inlining is done properly and everywhere, then you should not see the symbol of the function in the *.o object

file (use nm)• do a profiling of your code

– only profile optimized code with inlining – otherwise the findings are meaningless– if you see any low level functions show up in your profiling – means they were not inlined – fix the issue

• factors that are detrimental to inlining– big function– big file– function containing loops– recursive function– function containing switch statement (switch does introduce a non-inlined function call for each case!)– not seeing the function definition at the call site– making a chain of inline calls

Page 101: C++ Advanced

page 101Advanced C++; Gyuszi Suto, Intel 04/11/23

Code vs. Heap vs. Stack vs. Static Memory

int a; static int b;extern int x = 3;namespace BB{ int b;}

class Z{public: static int zz; int k;};

int Z::zz;

extern int y; // no storage

Z z1;

void foo() {}void bar() {}

int main(void){ int jj; static int s; Z z2; char * cp;

cp = new char[100];

Z * zp; zp = new Z;}

Static Memory

a 0x6013c0b 0x6013d4x 0x601298BB::b 0x6013c4Z::zz 0x6013c8zp->zz 0x6013c8z1 0x6013ccs 0x6013d8

Code Memory

foo 0x4009e0bar 0x4009f0main 0x400a30

Heap Memory

*zp 0x602080zp->k 0x602080*cp 0x602010

Stack Memory

jj 0x7fbfffeafcz2 0x7fbfffeaf0cp 0x7fbfffeae8zp 0x7fbfffeae0

the 4 memory segments can be placed in very different address spaces

Page 102: C++ Advanced

page 102Advanced C++; Gyuszi Suto, Intel 04/11/23

Peformance C++ : Cachingint main(void){ list<int> s; vector<int> v;

for(int i = 0; i < 50000000; ++i){ s.push_back(i); v.push_back(i); } long ss = 0, sv = 0;

for(set<int>::iterator sit = s.begin(); sit != s.end(); ++sit){ ss += *sit; }

for(int i = 0; i < v.size(); ++i){ sv2 += v[i]; } return 0;}

nextprevint

nextprevint

nextprevint

nextprevint

nextprev

control

list

allocator (heap) of list, all list elements allocated one after each other

beginmiddleend

vector

0.54 sec

0.09 sec 6x faster

vector is faster, elements are next to each other in memory, good cache performance

list is slower (6x) because it needs an extra dereferencing. Elements are in the heap next to each other, but each element is bigger (2 extra pointers), which slows it down (more memory traffic, more cache misses)

Page 103: C++ Advanced

page 103Advanced C++; Gyuszi Suto, Intel 04/11/23

Caching – effects on performanceint main(void){ list<int> s, s2; vector<int> v;

for(int i = 0; i < 50000000; ++i){ s.push_back(i); s2.push_back(i); // just to spread elements v.push_back(i); } long ss = 0, sv = 0; for(list<int>::iterator sit = s.begin(); sit != s.end(); ++sit){ ss += *sit; }

for(vector<int>::iterator vit = v.begin(); vit != v.end(); ++vit){ sv2 += *vit; } return 0;}

nextprevint

nextprevint

nextprevint

nextprevint

nextprev

control

list

allocator (heap) of list, all list elements allocated one after each other

beginmiddleend

vector

1.08 sec (2x slower than before)

0.06 sec (30% faster than before)

vector is faster, elements are next to each other in memory, good cache performanceIn addition, now we’re using the iterator, which has a better cache performance than using an index into the vecotor (which btw. introduces an extra multiply)

now the list is even slower (18x), because its elements are not following each other in the heap, there’s an element of list s2 wedged in-between, worsening the cache misses, making the list twice as slow as before

Page 104: C++ Advanced

page 104Advanced C++; Gyuszi Suto, Intel 04/11/23

stl set vs. vector find comparison

0

20

40

60

80

100

120

140

160

180

10 20 30 40 50 60 70 80 90 100

110

120

130

140

150

160

170

180

190

200

num of elements

ns

per

ran

do

m a

cces

s

vector.find(b, e, val)

for(it=v.b()..) if(*it == val)

set.find

2 6 1 4 3 5 vtemp vector numbers 1-M shuffled

6 4 3 2 1 5 v vector numbers 1-M shuffled (differently)

1 3

2

4

6

5

s set numbers 1-M ordered on RB tree

// linear search in vector using vector::findfor(vector<X>::iterator vit = vtemp.begin(); vit != vtemp.end(); ++vit){ i = find(v.begin(), v.end(), *vit);}

// linear search in vector using iteration begin -> endfor(vector<X>::iterator vit = vtemp.begin(); vit != vtemp.end(); ++vit){ for(i = v.begin(); i != v.end(); ++i){ if(*i == *vit) break; }}

// O(logn) search on set using set::findfor(vector<X>::iterator vit = vtemp.begin(); vit != vtemp.end(); ++vit){ si = s.find(*vit);}

This is an overload used by find() for the RAI case. 168 template<typename _RandomAccessIterator, typename _Tp> 169 _RandomAccessIterator 170 __find(_RandomAccessIterator __first, _RandomAccessIterator __last, 171 const _Tp& __val, random_access_iterator_tag) 172 { 173 typename iterator_traits<_RandomAccessIterator>::difference_type 174 __trip_count = (__last - __first) >> 2; 175 176 for (; __trip_count > 0; --__trip_count) 177 { 178 if (*__first == __val) 179 return __first; 180 ++__first; 181 182 if (*__first == __val) 183 return __first; 184 ++__first; 185 186 if (*__first == __val) 187 return __first; 188 ++__first; 189 190 if (*__first == __val) 191 return __first; 192 ++__first; 193 } 194 195 switch (__last - __first) 196 { 197 case 3: 198 if (*__first == __val) 199 return __first; 200 ++__first; 201 case 2: 202 if (*__first == __val) 203 return __first; 204 ++__first; 205 case 1: 206 if (*__first == __val) 207 return __first; 208 ++__first; 209 case 0: 210 default: 211 return __last; 212 } 213 } 214

Note: linear vector::find() is faster than logarithmic set::find() on M < 160 !!

loop unrolling in vector::find makes a big difference!!

Page 105: C++ Advanced

page 105Advanced C++; Gyuszi Suto, Intel 04/11/23

Caching – can you spot the problems?// Example 1vector<int> vx, vy; // huge vectors, storing x and y coordinates

class Rectangle{public: int xl, yl, xh, yh; Rectangle(size_t xli, size_t yli, size_t xhi, size_t yhi) : xl(vx[xli), yl(vy[yli]), xh(vx[xhi]), yh(vy[yhi]) {}}

// example 2vector<int> v;

void foo(){ vector<int> w; for( .. ) { temp = v[i] + w[i]; }}

// example 3void bar(){ set<int> si; map<int, char> mi; for( .. ){ si.insert(i); mi.insert(make_pair(i, c)); }}

Page 106: C++ Advanced

page 106Advanced C++; Gyuszi Suto, Intel 04/11/23

Caching – summary

• Cache misses can slow down the program significantly• You may not even be aware of cache misses, because it doesn’t show up in simple profilers

– foo() calls bar() 1M times; bar() is inlined, inside of bar() there’s cache miss; all you notice is that foo() is slow

• Small memory is best (less cache traffic)• Sequential access is best, don’t jump erratically in memory• Cache misses happen frequently on code as well (not just data) – this is hard to detect

– short code better than long code– short loops better than long loops

• Don’t switch unnecessarily between stack, heap, static memory segments• When you go from 32bits to 64bits, data tends to grow (~1.5x), more data traffic means more cache misses, means

slower program• Prefer memory blocks allocated in one chunk over elements allocated individually (they tend to scatter in memory)• If you use many objects of the same type, consider writing your own memory allocator (new/delete) for that type• Use Vtune to profile your code! Intel wrote it, and it’s surprisingly good.

Page 107: C++ Advanced

page 107Advanced C++; Gyuszi Suto, Intel 04/11/23

Peformance C++ : if-less code

// median function implementation in STLinline const int & median(

const int & a, const int & b, const int & c)

{ if (a < b) if (b < c) return b; else if (a < c) return c; else return a; else if (a < c) return a; else if (b < c) return c; else return b;}

_Z6medianRKiS0_S0_:.LFB2:

movl (%rdi), %ecxmovl (%rsi), %eaxcmpl %eax, %ecxjge .L2movl (%rdx), %r8dcmpl %r8d, %eaxjl .L3cmpl %r8d, %ecxjge .L5

.L4:movq %rdx, %rsi

.L3:movq %rsi, %rax.p2align 4,,1.p2align 3ret.p2align 4,,10.p2align 3

.L2:movl (%rdx), %r8dcmpl %r8d, %ecx.p2align 4,,2.p2align 3jl .L5cmpl %r8d, %eax.p2align 4,,2.p2align 3jl .L4movq %rsi, %rax.p2align 4,,3.p2align 3ret.p2align 4,,10.p2align 3

.L5:movq %rdi, %rsimovq %rsi, %rax.p2align 4,,4.p2align 3ret

Page 108: C++ Advanced

page 108Advanced C++; Gyuszi Suto, Intel 04/11/23

if-less code

const int & median( const int & a, const int & b, const int & c) {

const bool alb = a < b; const bool blc = b < c; const bool alc = a < c; const int * input[3] = {&a,&b,&c}; unsigned int index = 0; index += (alb & blc) | (!alb & !blc); index += (unsigned int)((alc & !blc) |

(!alc & blc)) << 1; return *(input[index]);}

__Z6medianRKiS0_S0_:.LFB2:

movl (%rsi), %ecxmovl (%rdi), %r8dmovl (%rdx), %eaxmovq %rsi, -32(%rsp)movq %rdx, -24(%rsp)movq %rdi, -40(%rsp)cmpl %ecx, %r8dsetl %r9bcmpl %eax, %ecxsetl %clcmpl %eax, %r8dmovl %r9d, %edxsetl %r8bmovzbl %cl, %esixorl $1, %ecxmovl %r8d, %eaxmovzbl %cl, %ecxxorl $1, %edxxorl $1, %eaxandl %ecx, %r8dandl %edx, %ecxandl %esi, %eaxandl %r9d, %esiorl %r8d, %eaxorl %esi, %ecxleal (%rcx,%rax,2), %eaxmov %eax, %eaxmovq -40(%rsp,%rax,8), %raxmovl (%rax), %eaxret

straight code, no ifs, no conditional jumps, faster!

Page 109: C++ Advanced

page 109Advanced C++; Gyuszi Suto, Intel 04/11/23

Runtime comparison between stl median and ifless median

unsigned int M = 100000000;vector<int> v;for(int i = 0; i < M; ++i){ v.push_back(rand());}

for(size_t i = 2; i < M; ++i){ ss1 += stl_median(v[i-2], v[i-1], v[i]);}

for(size_t i = 2; i < M; ++i){ ss1 += median(v[i-2], v[i-1], v[i]);}

for(size_t i = 2; i < M; ++i){ ss1 += stl_median(v[i-2], v[i-1], v[i++]); … ss1 += stl_median(v[i-2], v[i-1], v[i++]); // 8 – way loop unrolling }

for(size_t i = 2; i < M; ++i){ ss1 += median(v[i-2], v[i-1], v[i++]); … ss1 += median(v[i-2], v[i-1], v[i++]); // 8 – way loop unrolling }

ts = total time in for loop for stl median ts

tg = total time in for loop for stl median ts

ts8 = total time in for loop for stl median ts

tg8 = total time in for loop for stl median ts

ts/tg = 0.82

ts8/tg8 = 0.66

median + O

stl_median + O= 0.82

median + O/8

stl_median + O/8= 0.66

median/stl_median = 0.63

ifless median is 37% faster than stl median !

Page 110: C++ Advanced

page 110Advanced C++; Gyuszi Suto, Intel 04/11/23

Performance – summary

• use vector – it’s the best data structure• only do another data structure after you tried and profiled the vector implementation• avoid stupid mistakes

– passing large objects by value– having large temporary variables in a function– having too many if statements – branching is very costly– try if-less code if appropriate

• use latest compilers• keep files that contain “hot” code small

– then optimize them maximally• constantly profile your code

– small changes may have unexpected side-effects• unsupported code tends to slow down over time

• small memory is good• small code is good• use iterators instead of indexing• use provided algorithms, don’t reinvent them• use STL, boost• inline, inline, inline

• when you’ve done all of the above, then change your algorithm from O(n*n) to O(n * lg n)