factorials approximation (pdf)

Sizing up factorials

Another math essay by Dave Coulson, 2017

[email protected]

Lately I have been interested in working out ways to estimate the size of factorial numbers.

It started a few days ago when I was solving a special kind of crossword puzzle called a codebreaker.

This is a codebreaker. Letters have been replaced with numbers.

Source: http://www.supercoloring.com/puzzle-games/codebreaker-word-puzzle

My job is to translate the numbers back into letters in such a way that the strings of numbers spell words.

There should be only one correct translation, but I started to wonder if it was possible for a codebreaker to have two or more solutions. What were the odds against such a thing happening? And how hard would it be to make a code that had two legitimate solutions?

This should be very difficult to do, because each letter has its own unique character. It would be like forcing a J to behave like an E, etc.



Intuition convinces me that this should be impossible unless the creator of the code was very gifted and put a lot of effort into deliberately making it possible.



Intuition convinces me that this should be impossible unless the creator of the code was very gifted and put a lot of effort into deliberately making it possible.

.... But who really knows?

One way to find out would be to get a computer to substitute numbers with letters in some systematic way, going through all the possible translations, and checking the spelling of ‘words’ in each case.

How long would this take? How many translations would have to be tested?

The mathematicians reading this have already spotted that this is a factorial problem. There are 26 number choices for the letter A and each of those choices leaves 25 choices for the letter B and each of those choices leaves 24 choices for the letter C.... and so on right through the alphabet.

26 x 25 x 24 x 23 x 22 x 21 x 20 x 19 x 18 x 17 x 16 x 15 x 14 x 13 x 12 x 11 x 10 x 9 x 8 x 7 x 6 x 5 x 4 x 3 x 2 x 1

.... Also known as 26 factorial. (Mathematical notation “26!” )

What comes out is

It’s definitely a huge number, but how huge is huge? Is it a zillion gazillion or a hundred zillion gazillion? Normally estimating numbers is easy to do, but factorial numbers are particularly resistant to estimation. I realised that I could not even estimate the size of the number using any method I could think of .... until I got back home and started analysing the problem on paper and spreadsheet.

The result was a journey that took me surprisingly close to Stirling’s famous formula, and then onwards to creating a really simple method for estimating the size of a factorial number.

Now how big is 26 factorial?

The first step is to turn this massive multiplication problem into a massive addition problem. Our minds work better with addition. We can visualise N blocks being added together in a stacked structure much better than finding the volume of an object in hyperspace with N dimensions.

n

kLognLognLognLognnn 1

2 1 01 .... 01 . 01 . 01 ... 2 1

In this form, the problem consists of adding several Logs together and exponentiating the sum.

The Logs make a staircase that rises from left to right in a way that is not too far from linear, at least in this view. The sum therefore covers a bit more than half of the box it sits inside.

nLognkLog

n

21

1

nLogn

n!

21

10 ~

nLognkLog

n

21

1

nLogn

n!

21

10 ~ The unwavering decline in the steepness of the staircase – which is always there no matter how far rightwards we proceed - means that the true sum of the Logs will always be larger than estimated by a straight staircase.

A bit of trial and error shows that this is a really awful approximation.

f is some fraction to be determined experimentally. It’s bigger than ½ and smaller than 1.

nLognfkLog

n

1

But the picture shows that we can get reasonable accuracy by choosing a fraction bigger than ½, something closer to ¾ .

nLognfkLog

n

1

For smaller two-digit numbers, the fraction 2/3 will do. For numbers close to 100, the fraction 0.7 is better, and for number well over 100, it is better to use 0.75 or even higher.

So I have used my spreadsheet to calculate all the factorials between 1 and 170 – the biggest number my spreadsheet can handle – and found that the fraction grows as the input number n grows.

Even so, the approximation is always going to be out by a small amount, so that when you exponentiate the sum, the result will be out by several orders of magnitude.

nLognfn!

10 ~ This is never going to be a good

way to estimate factorials.

nLognfkLog

n

1

But let’s look at this a new way. I want a method that will give me an indication of how big a factorial is, and I know from my past investigations that I am generally only impressed by the size of the answer, not what digits are in it.

nLognfkLog

n

1

So as long as I am not going to use this answer as input for a further calculation – as in a binomial problem for example – then I can legitimately use the summation formula to get a ballpark notion of the size of a factorial, and accept that my answer could be out by a factor of 10 or even 100. It sounds huge, but in a number containing 20 or 30 digits, it is not such a big issue.

If I need accuracy, I will turn to technology to grind out the answer for me.

To illustrate, I will come back to my starting point, which is the number of translations for the English alphabet, which is 26 factorial.

Using my approach, I can get the order of magnitude for the answer very quickly by estimating

62 62 .70 Log

Now I don’t know what the Log of 26 is, but I can get the Log of 25 quite easily, and step up from there.

62 62 .70 Log

(Log 25 is double the size of Log 5)

25

704

100702

54

10070

252570 !25

2

~

. .

Log .

Log . Size

25 factorial has about 25 digits in it, so 26 factorial should have 26 or 27, because it is 26 times bigger than 25 factorial.

So the factorial of 26 is a number that has something like 26 digits in it. The answer is approximate. The true answer might be a number a hundred times bigger or a hundred times smaller, but on this scale I am not too worried. If I am to set up a computer program to go searching for alternate solutions to my crossword then I know I will need to investigate something like

100,000,000,000,000,000,000,000,000 possible arrangements

That’s a number measured (literally) in trillions of trillions, probably hundreds of trillions of trillions. I have to smile: that huge number comes out of a single crossword! Even if my hypothetical computer program can examine and discard a possible translation once every nanosecond, then I will have to run the program for 1017 seconds, or 3 billion years.

100,000,000,000,000,000,000,000,000

Another example: This time a number so big that my spreadsheet – which demands absolute accuracy even if I don’t need it - can’t compute it.

2400

1000 4.2

3 1000 .80

0001 1000 8.0 ! 1000

LogSize

I suspect that there’s not a single human being anywhere on this planet that needs to work with 1000 factorial. But I have demonstrated a way of estimating its size in a matter of seconds, something that my laptop can’t do. That’s pretty cool.

The answer may be out by half a dozen orders of magnitude or more, but I still have a sense of how large that number is, and that’s good enough for me.

2400

1000 4.2

3 1000 .80

0001 1000 8.0 ! 1000

LogSize

I mentioned earlier that this approach to estimating factorials would bring me close to Stirling’s formula. If you turn the summation into an integral, and use base e instead of base 10, then you are on your way.

dkLn k Ln k

nn

11

dkn

Ln k

e n! ~

1

Stirling’s formula

Const k k Ln k

dk k Ln k

dk k

k k Ln k

' dk Ln k dk Ln k dk

dkLn k dk Ln k

1

1

11

1

1

11

n n n Ln

k k Ln k dk n

Ln k n

e e n

n n Ln n e n

n n

1~!

The formula looks great and our experience of integrals would have us believe that the approximation should get better as n becomes very large, when the granularity of the summation becomes less significant. But as it turns out the approximation gets worse as n gets larger. This is because of the monotonic downward curvature of the Log function, which means that the integral always overestimates the summation. These small errors don’t cancel out, they accumulate. And then when you exponentiate the errors, they grow huge.

If you take a look at Stirling’s formula, you’ll see it has one more term than the formula I have created.

Stirling’s formula could be Log-ified (if that’s a word) and turned into a size formula like the one I have created, which should be more accurate than my earlier formula.

n

e

nn n

2~!

This compensates for the errors and does a really nice job of approximating factorials when n is large.

But it’s not the kind of formula you can use without some kind of technology; great for the scientists but not for people walking in the park.

e

n n Log n Log

e

n n Log n Log

e

n Log n Log

n

2

2

2

21

21

e

nn Log n Size

n

2~! 10

Without even testing it I can be sure that the formula is accurate. But it is a lot of work for one head, and a long thing to file away in memory for those odd occasions when I would use it. I find the earlier formula, even with its imperfections, easier to remember and easier to use.

n n n Log n Log

e

n n Log n Log n Size

143.0

2!

21

21

factorials approximation (pdf)

Education