[0.5em] numerical simulations using approximate random numbers · numerical simulations using...

Post on 16-Oct-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Mat

hem

atic

alIn

stitu

teU

nive

rsity

ofO

xfor

d

Numerical simulations usingapproximate random numbers

Oliver Sheridan-Methvenoliver.sheridan-methven@maths.ox.ac.uk

Thursday 6th February 2020

Supervisors:Prof. Michael Giles OxfordDr Christopher Goodyer Arm

EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling

Mat

hem

atic

alIn

stitu

teU

nive

rsity

ofO

xfor

d Some HPC dogmas

Of course my code will run faster if it’s vectorised.

No

Of course my code will run faster if I worked in a lower precision.

No

Well my compiler should be clever enough to decide what’s best.

No

Half-precision is useless for anyone who wants accurate answers!

No

You can’t design code that performs well on arbitrary vector lengths.

No

EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling

Mat

hem

atic

alIn

stitu

teU

nive

rsity

ofO

xfor

d Overview

1 What is an approximate random variable?

2 When can we use them?

3 Do we still get the right answer?

4 What precisions can we use, and when?

5 Conclusions

EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling

Mat

hem

atic

alIn

stitu

teU

nive

rsity

ofO

xfor

d Stochastic simulations

Weather Tomorrow’s temperature given today’s weather.Finance Value of contract given today’s prices.

Traffic Rush hour traffic given morning congestion.Health Risk of later secondary condition given current health.

Technology Expected number of online visitors given current search trends.

EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling

Mat

hem

atic

alIn

stitu

teU

nive

rsity

ofO

xfor

d Mathematical simulations

f (·)

F (·)

F−1(u)

EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling

Mat

hem

atic

alIn

stitu

teU

nive

rsity

ofO

xfor

d The inverse Gaussian CDF Φ−1(·)

0 0.5 1x

-5

0

5Φ−

1 (x)

EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling

Mat

hem

atic

alIn

stitu

teU

nive

rsity

ofO

xfor

d Tails kill performance in SVE and FP16

0 0.5 1x

-5

0

5Φ−

1 (x)

EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling

Mat

hem

atic

alIn

stitu

teU

nive

rsity

ofO

xfor

d Tails kill performance in SVE and FP16

0 0.5 1x

-5

0

5Φ−

1 (x)

EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling

Mat

hem

atic

alIn

stitu

teU

nive

rsity

ofO

xfor

d Piecewise constant approximation

0 0.5 1x

0

V(Z − Z̃

)≈ 1.6× 10−4 for 1024 partitions

Uniform piecewise constant approximation of Φ−1(·)

Φ−1(x)Φ̃−1(x)

EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling

Mat

hem

atic

alIn

stitu

teU

nive

rsity

ofO

xfor

d Piecewise linear dyadic approximation

0 0.5 1x

0

V(Z − Z̃

)≈ 4.3× 10−5 for 31 intervals

Piecewise linear dyadic approximation of Φ−1(·)

Φ−1(x)Φ̃−1(x)

EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling

Mat

hem

atic

alIn

stitu

teU

nive

rsity

ofO

xfor

d Polynomial approximation

0 0.5 1x

0

V(Z − Z̃

)≈ 2.6× 10−3 for a 7th order polynomial

Cubic approximation of Φ−1(·)

Φ−1(x)Φ̃−1(x)

EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling

Mat

hem

atic

alIn

stitu

teU

nive

rsity

ofO

xfor

d Speed of execution

Average speed(clock cycles)

Intel MKL 8Lookup table 2

EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling

Mat

hem

atic

alIn

stitu

teU

nive

rsity

ofO

xfor

d Speed of execution

Average speed(clock cycles)

Intel MKL 8Lookup table 2

GNU GSL 128Cephes 83NAG 117

EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling

Mat

hem

atic

alIn

stitu

teU

nive

rsity

ofO

xfor

d Speed of execution

Average speed(clock cycles)

Intel MKL 8Lookup table 2

GNU GSL 128Cephes 83NAG 117

GSL (optimised) 17Giles [1] (NVIDIA?) 12

EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling

Mat

hem

atic

alIn

stitu

teU

nive

rsity

ofO

xfor

d Accuracy and precision

EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling

Mat

hem

atic

alIn

stitu

teU

nive

rsity

ofO

xfor

d Multilevel Monte Carlo

E(P ) ≈ E(P̂Accurate

)= E

(P̂Crude

)+ E

(P̂Accurate − P̂Crude

)

EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling

Mat

hem

atic

alIn

stitu

teU

nive

rsity

ofO

xfor

d Nested multilevel Monte Carlo I

Monte Carlo (MC)

Multilevel Monte Carlo(MLMC)

Quantised multilevel MonteCarlo (QMLMC)

Reduced precision quantisedmultilevel Monte Carlo

(RPQMLMC)

Monte Carlo

Temporal discretisation

Quantised distribution

Reduced (half) precision

EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling

Mat

hem

atic

alIn

stitu

teU

nive

rsity

ofO

xfor

d Discretisation, quantisation, and roundoff

Correction = Discretisation × Quantisation

+ Roundoff︸ ︷︷ ︸1

Discretisation

EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling

Mat

hem

atic

alIn

stitu

teU

nive

rsity

ofO

xfor

d Discretisation, quantisation, and roundoff

Correction = Discretisation × Quantisation

+ Roundoff

︸ ︷︷ ︸1

Discretisation

EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling

Mat

hem

atic

alIn

stitu

teU

nive

rsity

ofO

xfor

d Discretisation, quantisation, and roundoff

Correction = Discretisation × Quantisation

+ Roundoff︸ ︷︷ ︸1

Discretisation

EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling

Mat

hem

atic

alIn

stitu

teU

nive

rsity

ofO

xfor

d When can’t we use half-precision?

0 5 10 15 20 25 30Level l with N = 2l

2−50

2−40

2−30

2−20

2−10

20 Reduced precision quantised multilevel Monte Carlo

V(X̂ f

64− X̂c64− X̃ f

16 + X̃c16

)

V(X̂ f

64− X̂c64− X̃ f

32 + X̃c32

)

V(X̂ f

64− X̂c64− X̃ f

64 + X̃c64

)

EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling

Mat

hem

atic

alIn

stitu

teU

nive

rsity

ofO

xfor

d Numerical results

Running our QMLMC estimator with a single quantisation level (1024 bins) forfinely discretised paths gives the following average time per path:

Relative accuracy ε = 10−3

Times per path 10−4 sMemoryintensive

Workintensive

Original MLMC 24.8 17.0

Quantised MLMC 13.2 3.99

Level pathsOriginal 1 920 000

Quantised 1 980 000Correction 14 000

EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling

Mat

hem

atic

alIn

stitu

teU

nive

rsity

ofO

xfor

d Numerical results

Running our QMLMC estimator with a single quantisation level (1024 bins) forfinely discretised paths gives the following average time per path:

Relative accuracy ε = 10−3

Times per path 10−4 sMemoryintensive

Workintensive

Original MLMC 24.8 17.0

Quantised MLMC 13.2 3.99×2

Level pathsOriginal 1 920 000

Quantised 1 980 000Correction 14 000

EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling

Mat

hem

atic

alIn

stitu

teU

nive

rsity

ofO

xfor

d Numerical results

Running our QMLMC estimator with a single quantisation level (1024 bins) forfinely discretised paths gives the following average time per path:

Relative accuracy ε = 10−3

Times per path 10−4 sMemoryintensive

Workintensive

Original MLMC 24.8 17.0

Quantised MLMC 13.2 3.99

Level pathsOriginal 1 920 000

Quantised 1 980 000Correction 14 000

EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling

Mat

hem

atic

alIn

stitu

teU

nive

rsity

ofO

xfor

d Numerical results

Running our QMLMC estimator with a single quantisation level (1024 bins) forfinely discretised paths gives the following average time per path:

Relative accuracy ε = 10−3

Times per path 10−4 sMemoryintensive

Workintensive

Original MLMC 24.8 17.0

Quantised MLMC 13.2 3.99×4

Level pathsOriginal 1 920 000

Quantised 1 980 000Correction 14 000

EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling

Mat

hem

atic

alIn

stitu

teU

nive

rsity

ofO

xfor

d Conclusions

Errors from using a cheap proxy distribution can be quantified and controlled bythe introduction of a nested multilevel Monte Carlo framework.

There is a degree of freedom in the construction of this proxy. Put the results inthe low level cache, or use a very cheap (piece-wise) polynomial.

The resultant approximations converge.

The approximate schemes scale as we move to wider vectors (SVE) and lowerprecisions (FP16), benefiting from greater SIMD parallelisation and faster FP16calculations.

Half-precision can be used in the coarsest calculations (which consume most ofthe computer time). (BFloat16 may require Kahan summation. . . ).

EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling

Mat

hem

atic

alIn

stitu

teU

nive

rsity

ofO

xfor

d References I

[1] Mike Giles. Approximating the erfinv function. In GPU Computing Gems,Jade Edition, volume 2, pages 109–116. Elsevier, 2011.

EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling

top related