armadillo: template metaprogramming for linear...

45
1 Introduction Template Metaprogramming TMP in Armadillo Conclusion Armadillo: Template Metaprogramming for Linear Algebra Leo Antunes and Markus Kruber Seminar on Automation, Compilers, and Code-Generation HPAC June 2014 Leo Antunes and Markus Kruber Armadillo

Upload: others

Post on 17-Feb-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

  • 1

    IntroductionTemplate Metaprogramming

    TMP in ArmadilloConclusion

    Armadillo: Template Metaprogramming forLinear Algebra

    Leo Antunes and Markus Kruber

    Seminar on Automation, Compilers, and Code-GenerationHPAC

    June 2014

    Leo Antunes and Markus Kruber Armadillo

    http://hpac.rwth-aachen.de

  • 2

    IntroductionTemplate Metaprogramming

    TMP in ArmadilloConclusion

    What is Armadillo?Code example

    What is Armadillo?

    . C++ Library for Linear Algebra

    . Aims for ease of use

    . Still wants performance

    . Abstract interface to BLAS and LAPACK

    . Pure template library (only headers)

    Leo Antunes and Markus Kruber Armadillo

  • 2

    IntroductionTemplate Metaprogramming

    TMP in ArmadilloConclusion

    What is Armadillo?Code example

    What is Armadillo?

    . C++ Library for Linear Algebra

    . Aims for ease of use

    . Still wants performance

    . Abstract interface to BLAS and LAPACK

    . Pure template library (only headers)

    Leo Antunes and Markus Kruber Armadillo

  • 2

    IntroductionTemplate Metaprogramming

    TMP in ArmadilloConclusion

    What is Armadillo?Code example

    What is Armadillo?

    . C++ Library for Linear Algebra

    . Aims for ease of use

    . Still wants performance

    . Abstract interface to BLAS and LAPACK

    . Pure template library (only headers)

    Leo Antunes and Markus Kruber Armadillo

  • 2

    IntroductionTemplate Metaprogramming

    TMP in ArmadilloConclusion

    What is Armadillo?Code example

    What is Armadillo?

    . C++ Library for Linear Algebra

    . Aims for ease of use

    . Still wants performance

    . Abstract interface to BLAS and LAPACK

    . Pure template library (only headers)

    Leo Antunes and Markus Kruber Armadillo

  • 3

    IntroductionTemplate Metaprogramming

    TMP in ArmadilloConclusion

    What is Armadillo?Code example

    Code example

    Calculating A−1BvvT :

    mat A =

  • 4

    IntroductionTemplate Metaprogramming

    TMP in ArmadilloConclusion

    C++ TemplatesOverviewExamples

    Refresher: C++ Templates

    . Allow type-agnostic development (e.g.: class Array)”generics” in Java, C#; ”parametric polymorphism” in Haskell, Scala

    . Realized at compile-time → code generation

    . Generate different class for each type used in code(thus: ”template”)

    . Therefore: no dynamic typing

    . Allows for further optimizations by compiler

    Leo Antunes and Markus Kruber Armadillo

  • 4

    IntroductionTemplate Metaprogramming

    TMP in ArmadilloConclusion

    C++ TemplatesOverviewExamples

    Refresher: C++ Templates

    . Allow type-agnostic development (e.g.: class Array)”generics” in Java, C#; ”parametric polymorphism” in Haskell, Scala

    . Realized at compile-time → code generation

    . Generate different class for each type used in code(thus: ”template”)

    . Therefore: no dynamic typing

    . Allows for further optimizations by compiler

    Leo Antunes and Markus Kruber Armadillo

  • 4

    IntroductionTemplate Metaprogramming

    TMP in ArmadilloConclusion

    C++ TemplatesOverviewExamples

    Refresher: C++ Templates

    . Allow type-agnostic development (e.g.: class Array)”generics” in Java, C#; ”parametric polymorphism” in Haskell, Scala

    . Realized at compile-time → code generation

    . Generate different class for each type used in code(thus: ”template”)

    . Therefore: no dynamic typing

    . Allows for further optimizations by compiler

    Leo Antunes and Markus Kruber Armadillo

  • 4

    IntroductionTemplate Metaprogramming

    TMP in ArmadilloConclusion

    C++ TemplatesOverviewExamples

    Refresher: C++ Templates

    . Allow type-agnostic development (e.g.: class Array)”generics” in Java, C#; ”parametric polymorphism” in Haskell, Scala

    . Realized at compile-time → code generation

    . Generate different class for each type used in code(thus: ”template”)

    . Therefore: no dynamic typing

    . Allows for further optimizations by compiler

    Leo Antunes and Markus Kruber Armadillo

  • 4

    IntroductionTemplate Metaprogramming

    TMP in ArmadilloConclusion

    C++ TemplatesOverviewExamples

    Refresher: C++ Templates

    . Allow type-agnostic development (e.g.: class Array)”generics” in Java, C#; ”parametric polymorphism” in Haskell, Scala

    . Realized at compile-time → code generation

    . Generate different class for each type used in code(thus: ”template”)

    . Therefore: no dynamic typing

    . Allows for further optimizations by compiler

    Leo Antunes and Markus Kruber Armadillo

  • 5

    IntroductionTemplate Metaprogramming

    TMP in ArmadilloConclusion

    C++ TemplatesOverviewExamples

    C++ Template example

    Type-dependent:

    class Adder{

    int base;

    public:

    Adder(int x):base(x){}

    int doit(int y){

    return base+y;

    }

    };

    Type-agnostic:

    template

    class Adder{

    TYPE base;

    public:

    Adder(TYPE x):base(x){}

    TYPE doit(TYPE y){

    return base+y;

    }

    };

    Leo Antunes and Markus Kruber Armadillo

  • 6

    IntroductionTemplate Metaprogramming

    TMP in ArmadilloConclusion

    C++ TemplatesOverviewExamples

    Template Metaprogramming

    . Use C++ templates as a language in itself

    . Type specification as pattern matching → Branching(independently of C++’s control structures)

    . Template used inside template → Recursion

    . Branching and Recursion → Turing complete

    . Allows offloading of computation to compile-time

    Leo Antunes and Markus Kruber Armadillo

  • 6

    IntroductionTemplate Metaprogramming

    TMP in ArmadilloConclusion

    C++ TemplatesOverviewExamples

    Template Metaprogramming

    . Use C++ templates as a language in itself

    . Type specification as pattern matching → Branching(independently of C++’s control structures)

    . Template used inside template → Recursion

    . Branching and Recursion → Turing complete

    . Allows offloading of computation to compile-time

    Leo Antunes and Markus Kruber Armadillo

  • 6

    IntroductionTemplate Metaprogramming

    TMP in ArmadilloConclusion

    C++ TemplatesOverviewExamples

    Template Metaprogramming

    . Use C++ templates as a language in itself

    . Type specification as pattern matching → Branching(independently of C++’s control structures)

    . Template used inside template → Recursion

    . Branching and Recursion → Turing complete

    . Allows offloading of computation to compile-time

    Leo Antunes and Markus Kruber Armadillo

  • 6

    IntroductionTemplate Metaprogramming

    TMP in ArmadilloConclusion

    C++ TemplatesOverviewExamples

    Template Metaprogramming

    . Use C++ templates as a language in itself

    . Type specification as pattern matching → Branching(independently of C++’s control structures)

    . Template used inside template → Recursion

    . Branching and Recursion → Turing complete

    . Allows offloading of computation to compile-time

    Leo Antunes and Markus Kruber Armadillo

  • 6

    IntroductionTemplate Metaprogramming

    TMP in ArmadilloConclusion

    C++ TemplatesOverviewExamples

    Template Metaprogramming

    . Use C++ templates as a language in itself

    . Type specification as pattern matching → Branching(independently of C++’s control structures)

    . Template used inside template → Recursion

    . Branching and Recursion → Turing complete

    . Allows offloading of computation to compile-time

    Leo Antunes and Markus Kruber Armadillo

  • 7

    IntroductionTemplate Metaprogramming

    TMP in ArmadilloConclusion

    C++ TemplatesOverviewExamples

    Factorial: normal C++

    int factorial(int n) {

    if(n < 1){

    return 1;

    }

    else {

    return n*factorial(n-1);

    }

    }

    Leo Antunes and Markus Kruber Armadillo

  • 8

    IntroductionTemplate Metaprogramming

    TMP in ArmadilloConclusion

    C++ TemplatesOverviewExamples

    Factorial: Template metaprogramming

    template

    struct factorial {

    const static int result =

    n * factorial :: result;

    };

    template

    struct factorial {

    const static int result = 1 ;

    };

    . Types used for pattern matching.

    . Structs used as functions.

    . Most ”specific” template chosen first. (cf. Haskell)

    Leo Antunes and Markus Kruber Armadillo

  • 9

    IntroductionTemplate Metaprogramming

    TMP in ArmadilloConclusion

    OverviewExamples

    TMP in Armadillo

    . Classes for common LA objects: matrix, vector, etc

    . Literal expression in code converted to template tree, e.g.:A*inv(B) → Glue

    . Infers information on arguments via pattern matching

    . User can provide information about objects.e.g.: diagmat(A) or trimatu(A)

    . Actual calculation delayed until runtime request for results

    Leo Antunes and Markus Kruber Armadillo

  • 9

    IntroductionTemplate Metaprogramming

    TMP in ArmadilloConclusion

    OverviewExamples

    TMP in Armadillo

    . Classes for common LA objects: matrix, vector, etc

    . Literal expression in code converted to template tree, e.g.:A*inv(B) → Glue

    . Infers information on arguments via pattern matching

    . User can provide information about objects.e.g.: diagmat(A) or trimatu(A)

    . Actual calculation delayed until runtime request for results

    Leo Antunes and Markus Kruber Armadillo

  • 9

    IntroductionTemplate Metaprogramming

    TMP in ArmadilloConclusion

    OverviewExamples

    TMP in Armadillo

    . Classes for common LA objects: matrix, vector, etc

    . Literal expression in code converted to template tree, e.g.:A*inv(B) → Glue

    . Infers information on arguments via pattern matching

    . User can provide information about objects.e.g.: diagmat(A) or trimatu(A)

    . Actual calculation delayed until runtime request for results

    Leo Antunes and Markus Kruber Armadillo

  • 9

    IntroductionTemplate Metaprogramming

    TMP in ArmadilloConclusion

    OverviewExamples

    TMP in Armadillo

    . Classes for common LA objects: matrix, vector, etc

    . Literal expression in code converted to template tree, e.g.:A*inv(B) → Glue

    . Infers information on arguments via pattern matching

    . User can provide information about objects.e.g.: diagmat(A) or trimatu(A)

    . Actual calculation delayed until runtime request for results

    Leo Antunes and Markus Kruber Armadillo

  • 9

    IntroductionTemplate Metaprogramming

    TMP in ArmadilloConclusion

    OverviewExamples

    TMP in Armadillo

    . Classes for common LA objects: matrix, vector, etc

    . Literal expression in code converted to template tree, e.g.:A*inv(B) → Glue

    . Infers information on arguments via pattern matching

    . User can provide information about objects.e.g.: diagmat(A) or trimatu(A)

    . Actual calculation delayed until runtime request for results

    Leo Antunes and Markus Kruber Armadillo

  • 10

    IntroductionTemplate Metaprogramming

    TMP in ArmadilloConclusion

    OverviewExamples

    Simple example: inv(inv(A))

    . Rationale: should not compute anything; just return A

    . Inner call to inv() returns a Op object.

    . Outter call matches the definition of inv() withOp argument. Returns A.

    . Advantages:

    . Obvious: no actual inversion.

    . Not obvious: multiple-dispatch without the runtime overhead.

    . Allows compiler to get rid of unneeded function calls.

    Leo Antunes and Markus Kruber Armadillo

  • 10

    IntroductionTemplate Metaprogramming

    TMP in ArmadilloConclusion

    OverviewExamples

    Simple example: inv(inv(A))

    . Rationale: should not compute anything; just return A

    . Inner call to inv() returns a Op object.

    . Outter call matches the definition of inv() withOp argument. Returns A.

    . Advantages:

    . Obvious: no actual inversion.

    . Not obvious: multiple-dispatch without the runtime overhead.

    . Allows compiler to get rid of unneeded function calls.

    Leo Antunes and Markus Kruber Armadillo

  • 11

    IntroductionTemplate Metaprogramming

    TMP in ArmadilloConclusion

    OverviewExamples

    Matrix Chain Multiplication

    . Matrix multiplication is associative.(ABC )D = A(BCD) = . . .

    . Order affects the number of operations.

    . A ∈ R60×5, B ∈ R5×30, C ∈ R30×10

    . (AB)C = (60 · 5 · 30) + (60 · 30 · 10) = 27.000

    . A(BC ) = (5 · 30 · 10) + (60 · 5 · 10) = 4.500

    . E.g. Matlab: naive left to right ordering

    . Complexity: O(n3) with dynamic programmingSmarter approach: O(n · log(n))

    Leo Antunes and Markus Kruber Armadillo

  • 11

    IntroductionTemplate Metaprogramming

    TMP in ArmadilloConclusion

    OverviewExamples

    Matrix Chain Multiplication

    . Matrix multiplication is associative.(ABC )D = A(BCD) = . . .

    . Order affects the number of operations.

    . A ∈ R60×5, B ∈ R5×30, C ∈ R30×10

    . (AB)C = (60 · 5 · 30) + (60 · 30 · 10) = 27.000

    . A(BC ) = (5 · 30 · 10) + (60 · 5 · 10) = 4.500

    . E.g. Matlab: naive left to right ordering

    . Complexity: O(n3) with dynamic programmingSmarter approach: O(n · log(n))

    Leo Antunes and Markus Kruber Armadillo

  • 11

    IntroductionTemplate Metaprogramming

    TMP in ArmadilloConclusion

    OverviewExamples

    Matrix Chain Multiplication

    . Matrix multiplication is associative.(ABC )D = A(BCD) = . . .

    . Order affects the number of operations.

    . A ∈ R60×5, B ∈ R5×30, C ∈ R30×10

    . (AB)C = (60 · 5 · 30) + (60 · 30 · 10) = 27.000

    . A(BC ) = (5 · 30 · 10) + (60 · 5 · 10) = 4.500

    . E.g. Matlab: naive left to right ordering

    . Complexity: O(n3) with dynamic programmingSmarter approach: O(n · log(n))

    Leo Antunes and Markus Kruber Armadillo

  • 12

    IntroductionTemplate Metaprogramming

    TMP in ArmadilloConclusion

    OverviewExamples

    Complex example: A*B*C*D

    . Instantiated as:Glue

    . Determines ”depth” of expression: N = 3(amount of operators with same precedence)

    . Pattern matches specialized function for N = 3 which:

    . Decide (ABC )D vs. A(BCD)

    . Heuristic: cost(ABC ) := columns(A) · rows(C )

    . Recursively calls reordering function with N = 2

    . Thus: linear (but sub-optimal) reordering of operations

    . Remember: all this at compile-time!

    Leo Antunes and Markus Kruber Armadillo

  • 12

    IntroductionTemplate Metaprogramming

    TMP in ArmadilloConclusion

    OverviewExamples

    Complex example: A*B*C*D

    . Instantiated as:Glue

    . Determines ”depth” of expression: N = 3(amount of operators with same precedence)

    . Pattern matches specialized function for N = 3 which:

    . Decide (ABC )D vs. A(BCD)

    . Heuristic: cost(ABC ) := columns(A) · rows(C )

    . Recursively calls reordering function with N = 2

    . Thus: linear (but sub-optimal) reordering of operations

    . Remember: all this at compile-time!

    Leo Antunes and Markus Kruber Armadillo

  • 12

    IntroductionTemplate Metaprogramming

    TMP in ArmadilloConclusion

    OverviewExamples

    Complex example: A*B*C*D

    . Instantiated as:Glue

    . Determines ”depth” of expression: N = 3(amount of operators with same precedence)

    . Pattern matches specialized function for N = 3 which:

    . Decide (ABC )D vs. A(BCD)

    . Heuristic: cost(ABC ) := columns(A) · rows(C )

    . Recursively calls reordering function with N = 2

    . Thus: linear (but sub-optimal) reordering of operations

    . Remember: all this at compile-time!

    Leo Antunes and Markus Kruber Armadillo

  • 13

    IntroductionTemplate Metaprogramming

    TMP in ArmadilloConclusion

    OverviewExamples

    Complex example: A*B*C*D (cont.)

    . ((AB)C )D 3

    ⇒ O(n3)

    . (A(BC ))D 3

    ⇒ O(n3)

    . A((BC )D) 3

    ⇒ O(n3)

    . A(B(CD)) 3

    ⇒ O(n3)

    . (AB)(CD) 7

    ⇒ O(n2)

    n Armadillo All

    3 2 24 4 55 8 146 16 427 32 132

    n

    n

    A

    1

    n

    B

    n1

    C

    n

    n

    D

    Leo Antunes and Markus Kruber Armadillo

  • 13

    IntroductionTemplate Metaprogramming

    TMP in ArmadilloConclusion

    OverviewExamples

    Complex example: A*B*C*D (cont.)

    . ((AB)C )D 3

    ⇒ O(n3)

    . (A(BC ))D 3

    ⇒ O(n3)

    . A((BC )D) 3

    ⇒ O(n3)

    . A(B(CD)) 3

    ⇒ O(n3)

    . (AB)(CD) 7

    ⇒ O(n2)

    n Armadillo All

    3 2 24 4 55 8 146 16 427 32 132

    n

    n

    A

    1

    n

    B

    n1

    C

    n

    n

    D

    Leo Antunes and Markus Kruber Armadillo

  • 13

    IntroductionTemplate Metaprogramming

    TMP in ArmadilloConclusion

    OverviewExamples

    Complex example: A*B*C*D (cont.)

    . ((AB)C )D 3 ⇒ O(n3)

    . (A(BC ))D 3 ⇒ O(n3)

    . A((BC )D) 3 ⇒ O(n3)

    . A(B(CD)) 3 ⇒ O(n3)

    . (AB)(CD) 7 ⇒ O(n2)

    n Armadillo All

    3 2 24 4 55 8 146 16 427 32 132

    n

    n

    A

    1

    n

    B

    n1

    C

    n

    n

    D

    Leo Antunes and Markus Kruber Armadillo

  • 13

    IntroductionTemplate Metaprogramming

    TMP in ArmadilloConclusion

    OverviewExamples

    Complex example: A*B*C*D (cont.)

    . ((AB)C )D 3 ⇒ O(n3)

    . (A(BC ))D 3 ⇒ O(n3)

    . A((BC )D) 3 ⇒ O(n3)

    . A(B(CD)) 3 ⇒ O(n3)

    . (AB)(CD) 7 ⇒ O(n2)

    n Armadillo All

    3 2 24 4 55 8 146 16 427 32 132

    n

    n

    A

    1

    n

    B

    n1

    C

    n

    n

    D

    Leo Antunes and Markus Kruber Armadillo

  • 14

    IntroductionTemplate Metaprogramming

    TMP in ArmadilloConclusion

    OverviewExamples

    Other improvements

    . A = A1 + A2 + . . . + AnNo temporary matrix besides output

    . y = A−1x → y = solve(A, x)Faster and more accurate

    . inv(diag(A))Elementwise inverse

    . as scalar(r ∗ X ∗ q) with r ∈ R1×n,X ∈ Rn×n, q ∈ Rn×1Result of computation is 1× 1 matrix

    . Lots more!

    Leo Antunes and Markus Kruber Armadillo

  • 14

    IntroductionTemplate Metaprogramming

    TMP in ArmadilloConclusion

    OverviewExamples

    Other improvements

    . A = A1 + A2 + . . . + AnNo temporary matrix besides output

    . y = A−1x → y = solve(A, x)Faster and more accurate

    . inv(diag(A))Elementwise inverse

    . as scalar(r ∗ X ∗ q) with r ∈ R1×n,X ∈ Rn×n, q ∈ Rn×1Result of computation is 1× 1 matrix

    . Lots more!

    Leo Antunes and Markus Kruber Armadillo

  • 14

    IntroductionTemplate Metaprogramming

    TMP in ArmadilloConclusion

    OverviewExamples

    Other improvements

    . A = A1 + A2 + . . . + AnNo temporary matrix besides output

    . y = A−1x → y = solve(A, x)Faster and more accurate

    . inv(diag(A))Elementwise inverse

    . as scalar(r ∗ X ∗ q) with r ∈ R1×n,X ∈ Rn×n, q ∈ Rn×1Result of computation is 1× 1 matrix

    . Lots more!

    Leo Antunes and Markus Kruber Armadillo

  • 14

    IntroductionTemplate Metaprogramming

    TMP in ArmadilloConclusion

    OverviewExamples

    Other improvements

    . A = A1 + A2 + . . . + AnNo temporary matrix besides output

    . y = A−1x → y = solve(A, x)Faster and more accurate

    . inv(diag(A))Elementwise inverse

    . as scalar(r ∗ X ∗ q) with r ∈ R1×n,X ∈ Rn×n, q ∈ Rn×1Result of computation is 1× 1 matrix

    . Lots more!

    Leo Antunes and Markus Kruber Armadillo

  • 14

    IntroductionTemplate Metaprogramming

    TMP in ArmadilloConclusion

    OverviewExamples

    Other improvements

    . A = A1 + A2 + . . . + AnNo temporary matrix besides output

    . y = A−1x → y = solve(A, x)Faster and more accurate

    . inv(diag(A))Elementwise inverse

    . as scalar(r ∗ X ∗ q) with r ∈ R1×n,X ∈ Rn×n, q ∈ Rn×1Result of computation is 1× 1 matrix

    . Lots more!

    Leo Antunes and Markus Kruber Armadillo

  • 15

    IntroductionTemplate Metaprogramming

    TMP in ArmadilloConclusion

    Conclusion

    3 Easy to use for non-LA-experts

    3 Some optimization comes built-in

    7 Cannot optimize all cases

    7 Debugging for more complex code can be a problem (C++)

    . Our recommendation: Use it! Don’t try to understand it.

    Leo Antunes and Markus Kruber Armadillo

  • 15

    IntroductionTemplate Metaprogramming

    TMP in ArmadilloConclusion

    Conclusion

    3 Easy to use for non-LA-experts

    3 Some optimization comes built-in

    7 Cannot optimize all cases

    7 Debugging for more complex code can be a problem (C++)

    . Our recommendation: Use it! Don’t try to understand it.

    Leo Antunes and Markus Kruber Armadillo

  • 15

    IntroductionTemplate Metaprogramming

    TMP in ArmadilloConclusion

    Conclusion

    3 Easy to use for non-LA-experts

    3 Some optimization comes built-in

    7 Cannot optimize all cases

    7 Debugging for more complex code can be a problem (C++)

    . Our recommendation: Use it! Don’t try to understand it.

    Leo Antunes and Markus Kruber Armadillo

  • 16

    IntroductionTemplate Metaprogramming

    TMP in ArmadilloConclusion

    Thanks! Questions?

    Sources:

    i Conrad Sanderson, Armadillo: An Open Source C++ Linear AlgebraLibrary for Fast Prototyping and Computationally Intensive Experiments,Technical Report, NICTA, 2010.

    i Armadillo source code.i C++ Programming, chapter Template Meta-Programming, Wikibooksi Matrix chain multiplication, Wikipedia

    Leo Antunes and Markus Kruber Armadillo

    http://arma.sourceforge.net/armadillo_nicta_2010.pdfhttp://arma.sourceforge.net/armadillo_nicta_2010.pdfhttp://arma.sourceforge.net/armadillo_nicta_2010.pdfhttp://sourceforge.net/projects/arma/files/https://en.wikibooks.org/wiki/C++_Programming/Templates/Template_Meta-Programminghttps://en.wikipedia.org/wiki/Matrix_chain_multiplication

    IntroductionWhat is Armadillo?Code example

    Template MetaprogrammingC++ TemplatesOverviewExamples

    TMP in ArmadilloOverviewExamples

    Conclusion