brace yourselves, leap second is coming

Nati CohenFewbytes

BRACE YOURSELVES

LEAP SECONDIS COMING

http://www.fewbytes.com/


Intro: Assumptions

Installing servers

1. Unbox2. Mount3. Connect to power4. Connect to network5. Power up6. Network boot7. …8. Profit

(Not) Installing servers

1. Unbox2. Mount3. Connect to power4. Connect to network5. Power up6. …7. …...8. ……...

Solving problems 101

1. Blame it on the network

2. DHCP issue?

3. PXE issue?

4. Problematic server?

We checked everything

at least anything that seemed plausible

But after 5 days...

Same MAC address

MAC addressA media access control address (MAC address) is a unique identifier assigned to network interfaces for communications on the physical network segment.

Once in a lifetime thing?

● HP servers & switches, Dell PCs, D-Link Access Points, Lenovo tablets, Android custom ROMs

Also:● 2008- Hyper-V● 2010- Xen 5.3● 2012- libvirt 0.10.2● 2013- VMWare ESXi 5.x● 2014- lxc-clone

http://h20564.www2.hp.com/hpsc/doc/public/display?docId=emr_na-c01042160

http://h20564.www2.hp.com/hpsc/doc/public/display?docId=mmr_kc-0102757

http://en.community.dell.com/support-forums/network-internet-wireless/f/3324/t/6039965

http://community.spiceworks.com/topic/278156-d-link-dap-2553-aps-with-duplicate-mac-addresses

https://community.landesk.com/support/message/109399

http://forum.xda-developers.com/showthread.php?t=2121151

http://forum.xda-developers.com/showthread.php?t=2121151

http://blogs.technet.com/b/jhoward/archive/2008/07/15/hyper-v-mac-address-allocation-and-apparent-network-issues-mac-collisions-can-cause.aspx

https://bugzilla.redhat.com/show_bug.cgi?id=483884

https://www.redhat.com/archives/libvir-list/2012-October/msg01509.html

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2030783

https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1346815

We all make assumptions

● Development● Debugging● Marketing● Support● …

Being informed, helps avoiding them!

Agenda

1. Intro: Assumptions

2. Missiles and Rounding Errors

3. Breaking the Internet in one second

4. Aviation Safety

Patriot missile defense system

1. Search

2. Validate

3. Track

Next position =Velocity (real) * Time (int -> real)

GAO/IMTEC-92-26 - Software Problem Led to System Failure at Dhahran, Saudi Arabia

http://www.gao.gov/products/IMTEC-92-26


to_seconds(ttos)

ttos = 28800 // 8*60*60*10ttos * 0.1 = ?

1. 2880.02. 28799.97253. 28799.9999998613974. ???

It depends on the representation...

Rational Numbers

Recall, integers:1337 = 00000101 00111001

Option 1: Rational Numbers● (numerator, denominator)● PROs: exact representation● CONs: Pi / sqrt(2), space, speed

Fixed Point

Option 2: Fixed-point● variable * (base ^ scaling factor)● base and scaling factor are fixed

○ binary vs decimal○ +/- exponent

● PROs: space, easy to compute● CONs: limited range of valueseg.variable = 30, base = 2, scaling factor = -3

00011.1102 = 21 + 20 + 2-1 + 2-2 = 3.75

Rounding errors

0.125 = 2-3

0.1 = 2-4 + 2-5 + 2-8 + 2-9 + 2-12 + 2-13 + ...

with 8 bit variable0.1 = 0.09375

with 24 bit variable0.1 = 0.09999990463256836

Floating Point

Option 3: Floating-point (IEEE 754)● (mantissa * 2^exponent)● eg. float- 1 sign, 23 mantissa, 8 exponent● PROs: wide range, fast w/ FPU● CONs: accuracyCaveats- NaNs, signed zero/infinity, denorm

rounding, ... IEEE Standards Association. "Standard for Floating-Point Arithmetic." IEEE 754-2008 (2008).

Goldberg, David. "What every computer scientist should know about floating-point arithmetic." ACM Computing Surveys (CSUR) 23.1 (1991): 5-48.

http://www.math.fsu.edu/~gallivan/courses/FCM1/IEEE-fpstandard-2008.pdf.gz

http://www.validlab.com/goldberg/paper.pdf

Picking the right tool

Rational Numbers - fractions.Fraction

Decimal Fixed-point - decimalBinary Fixed-point - spfpm module

Floating-point issues and limitation

https://docs.python.org/2/library/fractions.html

https://docs.python.org/2/library/decimal.html

https://pypi.python.org/pypi/spfpm/

https://docs.python.org/2/tutorial/floatingpoint.html

Recall: Patriot system

1. Search

2. Validate

3. Track

Next position =Velocity (real) * Time (int -> real)




Patriot system cont’d

● After 8 hours 0.0275 seconds error● After 100 hours 0.3433 seconds error

Scud velocity is ~ 1,676 meters per-second

687 meters error

Patriot system cont’d

On February 25, 1991, a Patriot missile defense system operating at Dhahran, Saudi Arabia, during Operation Desert Storm failed to track and intercept an incoming Scud.

This Scud subsequently hit an Army barracks, killing 28 Americans.




Let’s talk about time

Time is simple, right?

1 Year = 365 days

Leap year?

Time is (not) simple

1 Year = 365 or 366 days1 Month = 28/29/30/31 days

Mostly true, except:in Britain 1752, September had 19 daysin Russia 1918, February had 15 daysin Greece 1923, February had 15 days


1 Year = 365 or 366 days*1 Month = 28/29/30/31 days*1 Day = 24 hours

Don’t forget DST!it can also be 23/25or 23.5/24.5

Lord Howe Island, Australia

https://en.wikipedia.org/wiki/Lord_Howe_Island

https://en.wikipedia.org/wiki/Lord_Howe_Island


1 Year = 365 or 366 days*1 Month = 28/29/30/31 days*1 Day = 24 hours**1 Minute = 60 Seconds

NO- Leap Second might cause a minute to have 61 seconds, up to twice a year...

June 30, 2015 at 23:59

What could possibly go wrong?

● 2005/8- most NTPs failed to get it right● 2012- Bugs in Linux

○ Reddit, LinkedIn, Yelp, Meetup, Foursquare○ 400 Qantas flights delayed○ Leaping seconds and looping servers○ Linux's leap-second deadlocks

● s += 3600○ "one hour from now" ?○ "same time, next hour" ?

● 1 second in Flight Control = 300 meter

https://lwn.net/Articles/504744/

https://lwn.net/Articles/504744/

http://winningraceconditions.blogspot.co.il/2012/07/linuxs-leap-second-deadlocks.html

http://winningraceconditions.blogspot.co.il/2012/07/linuxs-leap-second-deadlocks.html

Possible solutions

● Simulate leap second○ eg. using adjtimex(8)

● Slewing time (AWS, Google)○ ntptime -s 0 # reset kernel state○ ntpdate -B <some-ntp-server>

● Servers from the future (Google, Facebook)● Have good monitoring

○ NTP offset○ leap second status○ @statscraft

http://manpages.ubuntu.com/manpages/trusty/man8/adjtimex.8.html

https://aws.amazon.com/blogs/aws/look-before-you-leap-the-coming-leap-second-and-aws/

http://googleblog.blogspot.co.il/2011/09/time-technology-and-leaping-seconds.html

https://twitter.com/statscraft

https://twitter.com/statscraft

● Falsehoods programmers believe about time● More falsehoods programmers believe about

time

● The One-Second War

Further reading

http://infiniteundo.com/post/25326999628/falsehoods-programmers-believe-about-time

http://infiniteundo.com/post/25326999628/falsehoods-programmers-believe-about-time

http://infiniteundo.com/post/25509354022/more-falsehoods-programmers-believe-about-time




http://cacm.acm.org/magazines/2011/5/107699-the-one-second-war/fulltext

http://cacm.acm.org/magazines/2011/5/107699-the-one-second-war/fulltext

Let’s talk about planes

787 Boeing Dreamliner

● 30/4/2015- FAA requires operators to do electrical power deactivation at intervals which will not exceed 120 days

FAA- Airworthiness Directives; The Boeing Company Airplanes

https://www.federalregister.gov/articles/2015/05/01/2015-10066/airworthiness-directives-the-boeing-company-airplanes#h-29

But, why?

“Boeing ... identified during laboratory testing the software counter internal to the generator control units (GCUs) will overflow after 248 days of continuous power, ... resulting in a loss of all AC electrical power regardless of flight phase”

FAA- Airworthiness Directives; The Boeing Company Airplanes

https://www.federalregister.gov/articles/2015/05/01/2015-10066/airworthiness-directives-the-boeing-company-airplanes#h-29

Wait, 248 days?

● 231 / (100 * 60 * 60 * 24) = 248.551

● ie. 231 deciseonds are roughly 248 days

● signed integer?

Recall 2-complement

00000000 000000001 100000010 200000011 3…01111111 12710000000 -12810000001 -127…

Arithmetic operations are easy, but might overflow:

00000001 1+

01111111 127=

10000000 -128

F**k overflows, I’m using Python

>>> import sys

>>> print sys.maxint, type(sys.maxint)

9223372036854775807 <type 'int'>

>>> print sys.maxint + 1, type(sys.maxint + 1)

9223372036854775808 <type 'long'>

Arbitrary Precision

struct _longobject {PyObject_VAR_HEADdigit ob_digit[1];

};

● PROs: unlimited*● CONs: slow, harder to implement

○ think about multiplication● What about builtins? C modules?

○ eg. formatting, unicode, itertools, sqlite

PEP 237 -- Unifying Long Integers and IntegersPython 2.7.10 source code, “Include/longintrepr.h”

https://bugs.python.org/issue14700



http://bugs.python.org/issue17073

https://www.python.org/dev/peps/pep-0237/

https://www.python.org/dev/peps/pep-0237/

Know your language (ruby)

[1] pry(main)> a = 1337

=> 1337

[2] pry(main)> a.class

=> Fixnum

[3] pry(main)> a = 2**100

=> 1267650600228229401496703205376

[4] pry(main)> a.class

=> Bignum

Know your language (Scala)

scala> 2147483647 + 1

res1: Int = -2147483648

scala> Math.addExact(2147483647, 1)

java.lang.ArithmeticException: integer overflow

scala> Math.pow(2, 1024)

res4: Double = Infinity

Know your language (JavaScript)

> Number.MAX_VALUE

1.7976931348623157e+308

> Number.MAX_VALUE*2

Infinity

> Number.MAX_VALUE + 1 - Number.MAX_VALUE

0

> Number.MAX_VALUE - Number.MAX_VALUE + 1

1

From the news*

● January 2014:○ >67m players per month○ >27m players per day○ >7.5m concurrently during peak hours○ 946m dollar yearly revenue

● 12/6/2015- The EU West Spectator mode

crashed○ game counter just surpassed 2147483647 (2B)○ ie. 2**31 games...

League of Legends tops MMO revenue list, Hearthstone No. 10LEAGUE PLAYERS REACH NEW HEIGHTS IN 2014EUW spectator mode fell over recently – here’s why

http://www.engadget.com/2014/10/23/league-of-legends-tops-mmo-revenue-list-hearthstone-no-10/

http://www.engadget.com/2014/10/23/league-of-legends-tops-mmo-revenue-list-hearthstone-no-10/

http://www.riotgames.com/articles/20140711/1322/league-players-reach-new-heights-2014

http://www.riotgames.com/articles/20140711/1322/league-players-reach-new-heights-2014

http://euw.leagueoflegends.com/en/news/riot-games/announcements/euw-spectator-mode-fell-over-recently-heres-why

http://euw.leagueoflegends.com/en/news/riot-games/announcements/euw-spectator-mode-fell-over-recently-heres-why

Take home message

● MAC addresses aren’t always unique● Real numbers can be deadly● Time is not simple● Beware of the overflow

We all make assumptions, let’s make less

Thank You!Nati Cohen

Fewbytes

[email protected]

@nocoot



mailto:[email protected]

mailto:[email protected]

https://twitter.com/nocoot

https://twitter.com/nocoot