float is legacy

Float is LegacyKenta Murata

RubyConf 2011

1Monday, October 10, 11

http://www.flickr.com/photos/recompile_net/5951998279/

Kenta MurataCRuby committer

bigdecimal maintainer

OS X platform maintainer

Interested in number system

@mrkn


https://twitter.com/#!/shyouhei/status/1198029834238074883Monday, October 10, 11

https://twitter.com/#!/shyouhei/status/119802983423807488

https://twitter.com/#!/shyouhei/status/119802983423807488

http://www.flickr.com/photos/recompile_net/5951998279/

Kenta MurataCRuby committer

bigdecimal maintainer

OS X platform maintainer

Interested in number system

Ruby Sapporo

@mrkn


Sapporo, Japanhttp://www.flickr.com/photos/muraken/6174655831


http://www.flickr.com/photos/muraken/6174655831

http://www.flickr.com/photos/muraken/6174655831

Sapporo, Japanhttp://www.flickr.com/photos/irasally/4708650832/


http://www.flickr.com/photos/irasally/4708650832/

http://www.flickr.com/photos/irasally/4708650832/

The RubyKaigi is finished.


Regional RubyKaigi is continue.


Sapporo RubyKaigi 04

in the next summer.


Official informationwill be coming soon.


Acknowledgement

Tatsuhiro Ujihisa, @ujmHootSuite Media, Inc.

Yoshimasa Niwa, @niwTwitter, Inc.


Float is LegacyKenta Murata

RubyConf 2011


Summary

Float requires us the advanced knowledge

Most rubyists don’t need Float

Rational is enough for us

Literal of decimal fraction interpreted as Rational makes us more happy


Float class


What is Float class?

A wrapper for C double.

Boxing a value of double.

Need to allocate an object to generate a new Float.


Do you know C double?

Floating point number with double precision.

No concrete representation is specified.

Most current platforms employ IEEE754.

It is IEEE754 binary64 on these platforms.

There are platforms employing other spec.


CRuby and JIS Ruby

Not requiring IEEE754.


Floating point numbers


The origin

NA = +6.022 141 79⇥ 10

23(±0.000 000 0030⇥ 10

23) [1/mol]

h = +6.626 069 57⇥ 10

�34(±0.000 000 0029⇥ 10

�34) [J s]


The origin

NA = +6.022 141 79⇥ 10

23(±0.000 000 0030⇥ 10

23) [1/mol]

h = +6.626 069 57⇥ 10

�34(±0.000 000 0029⇥ 10

�34) [J s]

sign


The origin

NA = +6.022 141 79⇥ 10

23(±0.000 000 0030⇥ 10

23) [1/mol]

h = +6.626 069 57⇥ 10

�34(±0.000 000 0029⇥ 10

�34) [J s]

fraction part

sign


The origin

NA = +6.022 141 79⇥ 10

23(±0.000 000 0030⇥ 10

23) [1/mol]

h = +6.626 069 57⇥ 10

�34(±0.000 000 0029⇥ 10

�34) [J s]

exponent partfraction part

sign


The origin

NA = +6.022 141 79⇥ 10

23(±0.000 000 0030⇥ 10

23) [1/mol]

h = +6.626 069 57⇥ 10

�34(±0.000 000 0029⇥ 10

�34) [J s]

exponent part:fraction part:

sign: s 2 {0, 1}

0 f Bn � 1

emin

e� q emax


Floating point numbersNumbers can be identified by (s, e, f ).

Represent approximation of real numbers.

Float types can be described by B, N, q, emin, and emax.

B is the base number of the exponent part.

N is the number of digits in the fraction part.

q is the bias for the exponent part.

emax and emin specify the limit of the exponent part.


(s, e, f) = (�1)s ⇥ f

BN⇥Be�q


e.g. IEEE754 binary64

B = 2

N = 53

q = 1,023

emin = –1,022

emax = +1,023

The maximum positive:1.797 693 134 862 315 7 ×10+308

The minimum nonzero positive:2.225 073 858 507 201 4 ×10–308


(s, e, f) = (�1)s ⇥ f

BN⇥Be�q


e.g. IEEE754 decimal64

B = 10

N = 16

q = 398

emin = –383

emax = +384

The maximum positive:9.999 999 999 999 999 ×10+384

The minimum nonzero positive:0.000 000 000 000 001 ×10–383


e.g. IBM’s double precision

B = 16

N = 56

q = 64

emin = –64

emax = +63

The maximum positive:7.237 005 577 332 262 11 ×10+75

The minimum nonzero positive:5.397 605 346 934 027 89 ×10–79


Floating point numbersNumbers can be identified by (s, e, f ).

Represent approximation of real numbers.

Float types can be described by B, N, q, emin, and emax.

B is the base number of the exponent part.

N is the number of digits in the fraction part.

q is the bias for the exponent part.

emax and emin specify the limit of the exponent part.


Every float is approximation



0 3/2–1



0 3/2–1

0.0 1.5–1.0

{ { {28Monday, October 10, 11


We should think:

There are no numbers represented exactly.

Floating point numbers always include errors.

Magnitude of errors depend on B, N, and e.


Why including errors?

Unavoidable issue from place-value notationwith finite digits rounding.

Very few values can be specified exactly.

We shouldn’t expect that a given value is exact.


How many decimal fractions can be exactly represented in the form of binary fraction?


Decimal form:

(0.1234)10 =(1234)10

104


Decimal form:

Binary form:

(0.10111)2 =(10111)2

25

(0.1234)10 =(1234)10

104


Decimal form:

Binary form:

(0.10111)2 =(10111)2

25

(0.1234)10 =(1234)10

104

0.b1b2 · · · bn =(b1b2 · · · bn)2

2n

0.d1d2 · · · dm =(d1d2 · · · dm)10

10m


(d1d2 · · · dm)1010m

=(d1d2 · · · dm)10

2m 5m=

C 5m

2m 5m=

C

2m


1.0

0.5

0.05 10 15 20 25 300

The ratio of inexact numbers

The ratio of exact numbers

The number of decimal digits


1.0

0.5

0.05 10 15 20 25 300


The ratio of exact numbers

17



1.0

0.5

0.05 10 15 20 25 300


The ratio of exact numbersIEEE

754 bina

ry64

17



Decimal in Binary

A N-digit decimal notation is exactly represented in binary notation only if its numerator divisible by 5N.

The ratio of N-digit decimal fractions exactly represented as binary fraction is 1 / 5N.

In IEEE754 binary64, almost all numbers are inexact.


Floating-point arithmetics

add, sub, mul, div, sqrt, ...

These operations work with errors.

Please read detail description:

“What Every Computer Scientist Should Know About Floating-Point Arithmetic”


Decimal fraction of Ruby


What’s the problem?

Ruby interprets literals of decimal fraction as Float

The following three numbers are Float, so they have errors.

1.0

1.2

0.42e+12


The issues from Float

There are many issues about Float reported to redmine.ruby-lang.org

They are caused by that Ruby interpretes the literals of decimal fraction as Float, I think.

Do you know these issues?


http://redmine.ruby-lang.org/issues/457640Monday, October 10, 11

http://redmine.ruby-lang.org/issues/4576

http://redmine.ruby-lang.org/issues/4576

Demonstration


$ ruby -vruby 1.9.4dev (2011-09-28 trunk 33354) [x86_64-darwin10.8.0]$ irb --simple-prompt>> (1.0 .. 12.7).step(1.3).to_a=> [1.0, 2.3, 3.6, 4.9, 6.2, 7.5, 8.8, 10.1, 11.4, 12.700000000000001]>> (1.0 ... 128.4).step(18.2).to_a=> [1.0, 19.2, 37.4, 55.599999999999994, 73.8, 92.0, 110.19999999999999, 128.39999999999998]>> (1.0 ... 128.4).step(18.2).to_a.size=> 8>> (1 ... 1284.quo(10)).step(182.quo(10)).to_a=> [1, (96/5), (187/5), (278/5), (369/5), (92/1), (551/5)]>> (1 ... 1284.quo(10)).step(182.quo(10)).to_a.size=> 7



The last value of the array should be equal to the end of the range



Some elements include errors



The array size is one larger than the correct size


Range#step with Float

The first case

The last value of the array is not equal to the end of the range.

The second case

Some elements include errors.

The array size is one larger than the right size.


Rational with decimal notation

Introducing one flag into a Rational object.

The flag represents a Rational seems which fraction or decimal.

If the flag is true, a Rational is converted decimal string by to_s.


Literal for Rational with decimal notation

Simple change for parser.

Interpreting literal of decimal fraction without exponent as Rational with decimal notation.

Literal of decimal fraction with exponent stays on Float.


Demonstrationusing the patched Rubyhttps://github.com/mrkn/ruby/tree/decimal_rational_implementation


https://github.com/mrkn/ruby/tree/decimal_rational_implementation

https://github.com/mrkn/ruby/tree/decimal_rational_implementation

$ ruby -vruby 1.9.4dev (2011-09-28 trunk 33354) [x86_64-darwin10.8.0]$ irb --simple-prompt>> (1.0 .. 12.7).step(1.3).to_a=> [1.0, 2.3, 3.6, 4.9, 6.2, 7.5, 8.8, 10.1, 11.4, 12.7]>> (1.0 .. 12.7).step(1.3).map(&:class)=> [Rational, Rational, Rational, Rational, Rational, Rational, Rational, Rational, Rational, Rational]>> (1.0 ... 128.4).step(18.2).to_a=> [1.0, 19.2, 37.4, 55.6, 73.8, 92.0, 110.2]>> (1.0 ... 128.4).step(18.2).to_a.size=> 7>> (1 ... 1284.quo(10)).step(182.quo(10)).to_a=> [1, (96/5), (187/5), (278/5), (369/5), (92/1), (551/5)]>> (1 ... 1284.quo(10)).step(182.quo(10)).to_a.size=> 7



The last value of the array is equal to the end of the range.



All elements in the array is Rational rather than Float.



The result array size is correct.


Benchmarking

Comparing Float, Rational, and C double.

Experimental environment:

MacBook Pro 15in (Mid 2010)

Core i7 2.66 GHz

Ruby 1.9.4dev (r33300) with gcc-4.2 -O3

C with llvm-gcc -O0


Benchmarking codes

Ruby code

https://gist.github.com/1253088

C code







0 [s]

0.75 [s]

1.5 [s]

2.25 [s]

3 [s]

1M additions 1M subtractions 1M multiplications

Based on ruby-1.9.4dev (r33300)

Float Rational C double

0.37

2.16

0.73

2.17

0.70

1.78

0.00777 0.00670 0.00770


0 [s]

0.003 [s]

0.005 [s]

0.008 [s]

0.01 [s]

1M additions 1M subtractions 1M multiplications

Based on ruby-1.9.4dev (r33300)

Float Rational C double

0.37 2.16 0.73 2.17 0.70 1.78

0.00777

0.00670

0.00770


Benchmarking summary

Rational is 2-5 times slower than Float.

Float is 2-digit order slower than C double.

C is amazingly fast.


If you said Rational is slow,Float isn’t as fast as your expect.


Rational vs Float


Rational vs Float

Exact computation is required by domains such as finance.

Float is required by scientific computation.


Rational vs Float

Exact computation is required by domains such as finance.

Float is required by scientific computation.

Other aspects indepenend of whether Rational or Float.


Conclusion

Float is difficult, troublesome, and not human oriented.

Rational is easy to understand, and human oriented.

It makes us more happy that Ruby interprets literal of decimal fraction as Rational.


Float is Legacy


float is legacy

Technology

s f e q s

j s exponent

floating point numbers

j s fraction partsignmonday

emin eq emax fraction

n bbmonday

f b n1sign

approximation of real