open source and traditional technical computing alan edelman massachusetts institute of technology...

35
Open Source and Traditional Technical Computing Alan Edelman Massachusetts Institute of Technology Scilabtec10 June 16, 2010

Upload: margaretmargaret-beasley

Post on 30-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Open Source and Traditional Technical Computing Alan Edelman Massachusetts Institute of Technology Scilabtec10 June 16, 2010

Open Source andTraditional Technical Computing

Alan EdelmanMassachusetts Institute of Technology

Scilabtec10June 16, 2010

Page 2: Open Source and Traditional Technical Computing Alan Edelman Massachusetts Institute of Technology Scilabtec10 June 16, 2010

How should

Compete/cooperate with

octave

Page 3: Open Source and Traditional Technical Computing Alan Edelman Massachusetts Institute of Technology Scilabtec10 June 16, 2010

The 800 pound gorilla

• Hard to beat that “first date” experience• Great numerical analyst Cleve Moler• People think MATLAB’s answer is

right, even when it’s wrong• Incredible documentation• Years and years of experience• Tons of functionality, but it’s not the

functionality

Page 4: Open Source and Traditional Technical Computing Alan Edelman Massachusetts Institute of Technology Scilabtec10 June 16, 2010

… vs. the power of“download, next, agree, next,

next, next, install”

Page 5: Open Source and Traditional Technical Computing Alan Edelman Massachusetts Institute of Technology Scilabtec10 June 16, 2010

Critical: trust in the numerics

Page 6: Open Source and Traditional Technical Computing Alan Edelman Massachusetts Institute of Technology Scilabtec10 June 16, 2010

but:• Typical user: In grade school, math was

either right or wrong. Getting it “wrong” meant: bad boy/bad girl. (Very shameful!)

• By extension, if a computer gets it wrong “Shame, Bad computer! Bad programmer!”

Most of the world’s users don’t understand ill-conditioning, good/bad relative accuracy, forward/backward error.

My personal hypothesis regarding the Pentium Bug of 1993: Bad division might have lead somebody to make money and somebody to lose money, but the affected parties were blissfully unaware!

Page 7: Open Source and Traditional Technical Computing Alan Edelman Massachusetts Institute of Technology Scilabtec10 June 16, 2010

Time feels right to harness…

Kahan

Page 8: Open Source and Traditional Technical Computing Alan Edelman Massachusetts Institute of Technology Scilabtec10 June 16, 2010

• Tons of Numerical Analysis Classes• Research from

– Classical & IEEE numerical analysis– Algorithmic design– Function Encyclopedias and Software

• Experience of libraries and computing environments– LAPACK, MATLAB®, Mathematica, Maple– scilab, python, R

But none of these…

Teach a young library writer what to do…

Page 9: Open Source and Traditional Technical Computing Alan Edelman Massachusetts Institute of Technology Scilabtec10 June 16, 2010

Misconceptions• There is a right and a wrong.

– In fact there are (poorly understood) tradeoffs.

• Computers are highly consistent:– Programs are

remarkably idiosynchratic

Page 10: Open Source and Traditional Technical Computing Alan Edelman Massachusetts Institute of Technology Scilabtec10 June 16, 2010

Back to Open source

• With open source, we can “see” and critique the algorithms.

• With private code we can only test

Page 11: Open Source and Traditional Technical Computing Alan Edelman Massachusetts Institute of Technology Scilabtec10 June 16, 2010

• Math: Det(matrix of integers)=integer

Page 12: Open Source and Traditional Technical Computing Alan Edelman Massachusetts Institute of Technology Scilabtec10 June 16, 2010

But is it right?• a=sign(randn(27)),det(a)• a = 1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 1 -1 1 1 1 -1 1 -1 -1 1 1 1 1 1 -1 1 -1 1 1 1 1 -1 -1 1 -1 -1 1 1 -1 1 1 -1 -1 -1 1 -1 -1 -1 -1 1 -1 -1 -1 -1 1 -1 1 -1 -1 1 1 -1 -1 1 -1 -1 -1 -1 -1 -1 1 1 -1 -1 -1 -1 -1 1 -1 -1 1 1 1 -1 -1 -1 -1 -1 1 1 1 -1 1 1 1 -1 1 1 1 1 1 1 -1 1 -1 -1 1 1 -1 1 -1 -1 1 1 -1 -1 1 1 1 1 -1 1 -1 -1 -1 -1 -1 -1 -1 1 -1 1 -1 1 -1 1 -1 1 1 1 1 1 -1 1 -1 1 -1 -1 1 -1 1 1 1 -1 1 1 -1 -1 1 1 -1 -1 1 1 -1 1 -1 1 1 1 1 1 -1 -1 1 1 -1 -1 1 -1 1 1 1 -1 -1 1 -1 1 1 1 1 -1 1 1 1 -1 1 1 -1 1 1 1 -1 -1 -1 1 -1 1 1 -1 -1 -1 -1 1 1 1 1 1 -1 1 -1 1 1 -1 -1 1 1 1 1 -1 -1 -1 -1 -1 -1 1 1 -1 -1 -1 1 1 -1 -1 1 -1 -1 1 1 1 1 -1 1 1 1 -1 -1 1 1 1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 1 -1 1 -1 1 1 -1 1 1 1 -1 -1 -1 1 -1 -1 -1 -1 1 -1 -1 -1 -1 1 -1 -1 1 1 -1 1 -1 -1 1 1 -1 -1 1 -1 -1 1 1 -1 1 1 -1 1 1 -1 -1 -1 1 1 1 1 -1 -1 1 -1 1 1 -1 1 -1 -1 1 -1 -1 1 -1 1 -1 1 1 1 -1 -1 1 1 -1 -1 -1 -1 -1 1 1 1 1 1 -1 1 1 -1 -1 -1 -1 1 -1 1 -1 1 -1 -1 1 -1 -1 -1 -1 -1 -1 1 -1 1 -1 1 1 -1 1 -1 -1 1 1 1 -1 1 1 -1 1 -1 -1 1 1 -1 -1 1 -1 -1 1 1 -1 1 -1 1 1 -1 1 1 -1 1 1 -1 1 1 -1 1 1 1 -1 1 -1 1 -1 1 1 1 1 -1 -1 -1 1 1 1 -1 1 1 1 -1 -1 -1 1 -1 1 -1 -1 -1 -1 1 1 -1 -1 1 1 -1 -1 -1 1 -1 1 1 1 1 -1 -1 -1 -1 1 1 -1 1 -1 1 -1 1 1 1 1 1 -1 -1 -1 -1 -1 -1 1 1 1 -1 -1 -1 1 1 -1 -1 -1 -1 -1 -1 -1 -1 1 1 1 -1 -1 1 -1 -1 1 -1 -1 1 1 -1 -1 -1 1 -1 -1 1 1 -1 -1 1 -1 -1 -1 1 -1 1 -1 -1 1 1 -1 1 -1 1 1 -1 1 -1 1 -1 -1 1 -1 1 1 1 1 1 -1 1 -1 1 -1 1 -1 1 1 1 1 1 1 1 1 -1 -1 -1 1 -1 -1 1 1 1 -1 -1 -1 1 -1 1 -1 -1 1 -1 -1 -1 -1 1 -1 -1 1 1 1 1 1 -1 1 1 1 1 1 -1 1 -1 1 -1 -1 1 1 -1 -1 1 1 1 -1 1 -1 -1 1 1 -1 1 1 1 -1 1 -1 1 -1 1 1 -1 1 -1 1 1 -1 1 -1 -1 1 -1 -1 1 1 1 -1 1 -1 -1 1 -1 1 1 -1 -1 1 1 1 -1 1 1 -1 1 1 1 1 1 -1 1 -1 1 -1 1 1 -1 11 1 1 1 -1 -1 -1 1 1 -1 1 -1 -1 1 -1 1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1

• ans = 839466457497601

Page 13: Open Source and Traditional Technical Computing Alan Edelman Massachusetts Institute of Technology Scilabtec10 June 16, 2010

Continuity is a function property, not seen at a point

octave:> atan(1+i*2^64*[1 1+eps]))ans =

   1.5708  -1.5708Sloppy?

Okay? tan(x+pi) is tan(x) after all so is it okay to return atan(x) + pi * any integer

Principle: Continuity is required even if special handling is required on branch cuts, stratifications, etc.

What about QR? Some people argue it’s different, it’s okay to just produce a Q and R

such that QR is nearly A Nonsense, continuity is required!

Page 14: Open Source and Traditional Technical Computing Alan Edelman Massachusetts Institute of Technology Scilabtec10 June 16, 2010

Example 2: Continuity• Arctan: Analytic in complex plane

– Except at usual branch cuts: ±i(1,∞)– Undefined at ±i (Arguably NaN+i*Inf, but there’s

another nightmare for another time)– Deserves to be continuous off the imaginary axis?

• Example along real(z) = 1:

octave:> atan(1+i*2^64*[1 1+eps]))ans =

   1.5708  -1.5708

Sloppy?Okay? tan(x+pi) is tan(x) after all so is it okay to

return atan(x) + pi * any integer

Page 15: Open Source and Traditional Technical Computing Alan Edelman Massachusetts Institute of Technology Scilabtec10 June 16, 2010

Example1 : Numerical Accuracy• Statistics Function: “erfinv” (erfinv(erf)=“Identity function”)• >> erfinv([.6827 .9545  0.9973])*sqrt(2)

ans =    1.0000    2.0000    3.0000

• In a sample of floating point numbers from [1,2) how unusual is it to be an integer?>> erfinv(1-eps)Exact: 5.8050186831934533001812 MATLAB: 5.805365688510648

erfinv(1-eps - eps/2) is exactly 5.77049185355333287054782 anderfinv(1-eps + eps/2) is exactly 5.86358474875516792720766 condition number is O(1e14)

• Plain old “inv”: (inv(A) composed with A=“Identity”)• >> a=0; inv(a) Warning: Matrix is close to singular or badly scaled..

Page 16: Open Source and Traditional Technical Computing Alan Edelman Massachusetts Institute of Technology Scilabtec10 June 16, 2010

Possible Conclusions?• Bad Algorithm: should have had a small forward error

• Bad MATLAB®: should have warned us

• Good backward error so we need not worry?

• The user was already in extremely bad shape or extremely good shape anyway:– Already “been hit by a truck”. (Think eigenvalues when pseudos!)– Somehow won’t matter. (Think preconditioning!)

• Problem is too ridiculous (but this attitude always comes back to haunt you …)

Page 17: Open Source and Traditional Technical Computing Alan Edelman Massachusetts Institute of Technology Scilabtec10 June 16, 2010

Speaking of QR• What about portability?

• Need the Rosetta Stone

Page 18: Open Source and Traditional Technical Computing Alan Edelman Massachusetts Institute of Technology Scilabtec10 June 16, 2010

Speaking of QR• What about portability? A=ones(4,4)

Page 19: Open Source and Traditional Technical Computing Alan Edelman Massachusetts Institute of Technology Scilabtec10 June 16, 2010

• No different from atan• Good enough to take atan+pi*(random integer?)• Is it okay to take Q*diag(random(signs?))?

– There is an argument to avoid gratuitous discontinuity! Just like atan

– Many people take [Q,R]=qr(randn(n)) to get uniformly distributed orthogonal matrices! But broken because Q is only piecewise continuous.

– Backward error analysis has a subtlety. We guarantee that [Q,R] is “a” QR decomposition of a nearby matrix, not “the” QR decomposition of a nearby matrix. Even without roundoff, there may be no algorithm specified that gives “THIS” QR for the rounded output.

Continuity is a function property, not seen at a point: QR

Page 20: Open Source and Traditional Technical Computing Alan Edelman Massachusetts Institute of Technology Scilabtec10 June 16, 2010

Example 3: Embeddings

• log2(2^k)? log10(10^k)?• (double)^(integer)?

• Integers are embedded in reals• Reals embedded in complexes• 2d arrays embedded in nd arrays

Determinant:>> m=magic(6); [det(m) prod(svd(m)) prod(diag(lu(m)))]' ans = 1.0e-07 * 0 0.147868183627696 0.080494970688960

Page 21: Open Source and Traditional Technical Computing Alan Edelman Massachusetts Institute of Technology Scilabtec10 June 16, 2010

Opinion

• I like the embedding concept

• But the answer should be right, continuous, and monotonic without side affects

• Terrible embedding: sort

Page 22: Open Source and Traditional Technical Computing Alan Edelman Massachusetts Institute of Technology Scilabtec10 June 16, 2010

Sorting>> x=[-1 1 2]x = -1 1 2>> sort(x)ans = -1 1 2>> x=[-1 1 2*i]x = -1.0000 1.0000 0 + 2.0000i>> sort(x)ans = 1.0000 -1.0000 0 + 2.0000i>> 1 < (2*i)ans = 0>> 1 > (2*i)ans = 1>> max(1,i)ans = 1>> min(1,i)ans = 1>> max(i,1)ans = 0 + 1.0000i>> min(i,1)ans = 0 + 1.0000i

>> a=[-2 -1  1 2 3 ]; roots(poly(a))'ans =   3.0000   -2.0000   -1.0000    2.0000    1.0000

>> x=[1 -1 i]'; y=[1;-1;-i]; [x y x-y]ans = 1.0000 1.0000 0 -1.0000 -1.0000 0 0 - 1.0000i 0 - 1.0000i 0 >> [sort(x) sort(y)]ans = -1.0000 0 - 1.0000i 0 - 1.0000i 1.0000

1.0000 -1.0000

Page 23: Open Source and Traditional Technical Computing Alan Edelman Massachusetts Institute of Technology Scilabtec10 June 16, 2010

Lowering Friction between systemsFirst Element

Page 24: Open Source and Traditional Technical Computing Alan Edelman Massachusetts Institute of Technology Scilabtec10 June 16, 2010

Accuracy: Sine Function Extreme Argument sin(2^64)=0.0235985099044395586343659…One IEEE double ulp higher: sin(2^64+4096)=0.6134493700154282634382759…over 651 wavelengths higher. 2^64 is a good test caseMaple 10 evalhf(sin(2^64)); 0.0235985099044395581

note evalf(sin(2^64)) does not accurately use 2^64 but rather computes evalf(sin(evalf(2^64,10))) Mathematica 6.0 Print[N[Sin[2^64]]]; Print[N[Sin[2^64],20]];

0.312821*

0.023598509904439558634MATLAB sin(2^64) 0.023598509904440

sin(single(2^64)) 0.0235985Octave 2.1.72: sin(2^64) 0.312821314503348*

python 2.5 numpy sin(2**64) 0.24726064630941769**R 2.4.0 format(sin(2^64),digits=17) 0.24726064630941769**

Scilab format(25);sin(2^64) 0.0235985099044395581214

Notes: Maple and Mathematica use both hardware floating point and custom vpaMATLAB discloses use of FDLIBM. MATLAB and extra precision MAPLE and Mathematica can claim small forward errorAll others can claim small backward error Excel doesn’t compute 2^64 accurately and sin is an error. Google gives 0.312821315

Ruby on http://tryruby.hobix.com/ gives -0.35464… for Math.sin(2**64). On a sun4 the correct was given with Ruby.Relative Backward Error <= (2pi/2^64) = 3e-19 Relative Condition Number = abs(cos(2^64)dx/sin(2^64))/(dx/2^64) = 8e20

One can see accuracy drainage going back to around 2^21 or so, by powers of 2

*Mathematica and python have been observed to give the correct answer on certain 64 bit linux machines. **The python and R above numbers listed here were taken with Windows XP. The .3128 number was seen with python on a 32 bit linux machine. The correct answer was seen with R on a sun4 machine.It’s very likely that default N in MMA, Octave, Python, and R pass thru to various libraries, such as LIBM, and MSVCRT.DLL.

Page 25: Open Source and Traditional Technical Computing Alan Edelman Massachusetts Institute of Technology Scilabtec10 June 16, 2010

Edge Case: Sine Function at Infinity

• Maple: sin(infinity); undefined evalf(sin(infinity)); Float(undefined)• Mathematica: Sin[Infinity] Interval[{-1., 1.}]• MATLAB: sin(inf) NaN• Octave: “ “ “ “• Numpy: “ “ -1.#IND• R: sin(Inf) NaN (and a

warning)

• Scilab: sin(%inf) Nan

Page 26: Open Source and Traditional Technical Computing Alan Edelman Massachusetts Institute of Technology Scilabtec10 June 16, 2010

Coverage: Erf Function Complex Support

python/scipy is the only numerical package good• Maple: evalf(erf(I)); 1.650425759 I• Mathematica: N[Erf[I]] 0. + 1.65043 I• MATLAB: erf(i) ??? Input must be real.

• Octave: “ “ error: erf: unable to handle complex arguments

• Python/Scipy: erf(1j) 1.6504257588j• R: pnorm(1i) Error in pnorm … Non-numeric argument …

• Scilab: erf(%i) !--error 52 argument must be a real …

Page 27: Open Source and Traditional Technical Computing Alan Edelman Massachusetts Institute of Technology Scilabtec10 June 16, 2010

Optimization

• Would love to see the “Consumer Reports Style” Article

Page 28: Open Source and Traditional Technical Computing Alan Edelman Massachusetts Institute of Technology Scilabtec10 June 16, 2010

Scorecard Approach

Page 29: Open Source and Traditional Technical Computing Alan Edelman Massachusetts Institute of Technology Scilabtec10 June 16, 2010

Unique Opportunity

• Raise the bar for all functions:

– Linear Algebra!

– Elementary Functions

– Special Functions

– Hard Functions: Optimization/Diff Eqs

– Combinatorial and Set Functions• What about

– 0-d, 1-d, n-d arrays, empty arrays– Inf, Not Available (NaN)– Types (integer, double, complex) and embeddings

• (real inside complex, matrices insidse n-d arrays)• Methodology

– Work in Progress: Create a web experience that guides and educates users in the amount of time that it takes to say Wikipedia

– Futuristic?: The Semantic Web and Other Automations?

Page 30: Open Source and Traditional Technical Computing Alan Edelman Massachusetts Institute of Technology Scilabtec10 June 16, 2010

Extra Slides

• Extra Slides

Page 31: Open Source and Traditional Technical Computing Alan Edelman Massachusetts Institute of Technology Scilabtec10 June 16, 2010

The Linear Algebra World

• Some of the world’s best practices.

• Still Inconsistent, still complicated.

• Never go wrong reading a book authored by someone named Nick, a paper or key example by Velvel, all the good stuff in terms of error bounds of LAPACK and papers from our community

Page 32: Open Source and Traditional Technical Computing Alan Edelman Massachusetts Institute of Technology Scilabtec10 June 16, 2010

Computing Environments: what’s appropriate?

• “Good” numerics?– Who defines good?– How do we evaluate?

• Seamless portability?– Rosetta stone?– “Consumer Reports?”

• Easy to learn?• Good documentation?• Proprietary or Free?

– “Download, next, next, next, enter, and compute?”

Page 33: Open Source and Traditional Technical Computing Alan Edelman Massachusetts Institute of Technology Scilabtec10 June 16, 2010

Numerical Errors: What’s appropriate?

• The exact answer?• What MATLAB® gives!• float(f(x))??? (e.g. some trig libraries)• Small Forward Error• Small Backward Error• Nice Mathematical Properties• What people expect, right or wrong• Consistent?• Nice mathematics?

Page 34: Open Source and Traditional Technical Computing Alan Edelman Massachusetts Institute of Technology Scilabtec10 June 16, 2010

Coverage: Sine Function Complex Support

• All packages seem good» Maple: evalf(sin(I))» Mathematica: N[Sin[I]]» MATLAB: sin(i) » Octave: “ “» Numpy: sin(1j)» R: sin(1i)» Ruby: require 'complex'; Complex::I; Math::sin(Complex::I)

» Scilab: sin(%i)» EXCEL: =IMSIN(COMPLEX(0,1))

Note special command in excel

Page 35: Open Source and Traditional Technical Computing Alan Edelman Massachusetts Institute of Technology Scilabtec10 June 16, 2010

Coverage: Erf Function Complex Support

python/scipy is the only numerical package good• Maple: evalf(erf(I)); 1.650425759 I• Mathematica: N[Erf[I]] 0. + 1.65043 I• MATLAB: erf(i) ??? Input must be real.

• Octave: “ “ error: erf: unable to handle complex arguments

• Python/Scipy: erf(1j) 1.6504257588j• R: pnorm(1i) Error in pnorm … Non-numeric argument …

• Scilab: erf(%i) !--error 52 argument must be a real …