random numbers for large-scale distributed monte carlo simulations · 2007. 6. 21. · random...

14
Random numbers for large-scale distributed Monte Carlo simulations Heiko Bauke * Institut für Theoretische Physik, Universität Magdeburg, Universitätsplatz 2, 39016 Magdeburg, Germany and Rudolf Peierls Centre for Theoretical Physics, University of Oxford, 1 Keble Road, Oxford, OX1 3NP, United Kingdom Stephan Mertens Institut für Theoretische Physik, Universität Magdeburg, Universitätsplatz 2, 39016 Magdeburg, Germany Received 19 September 2006; published 6 June 2007 Monte Carlo simulations are one of the major tools in statistical physics, complex system science, and other fields, and an increasing number of these simulations is run on distributed systems like clusters or grids. This raises the issue of generating random numbers in a parallel, distributed environment. In this contribution we demonstrate that multiple linear recurrences in finite fields are an ideal method to produce high quality pseudorandom numbers in sequential and parallel algorithms. Their known weakness failure of sampling points in high dimensions can be overcome by an appropriate delinearization that preserves all desirable properties of the underlying linear sequence. DOI: 10.1103/PhysRevE.75.066701 PACS numbers: 02.70.c, 02.70.Rr, 05.10.Ln I. INTRODUCTION The Monte Carlo method is a major industry and random numbers are its key resource. In contrast to most commodi- ties, quantity and quality of randomness have an inverse re- lation: The more randomness you consume, the better it has to be 1. The quality issue arises because simulations only approximate randomness by generating a stream of determin- istic numbers, named pseudorandom numbers PRNs, with successive calls to a pseudorandom number generator PRNG. Considering the ever increasing computing power, however, the quality of PRNGs is a moving target. Simula- tions that consume 10 10 pseudorandom numbers were con- sidered large-scale Monte Carlo simulation a few years ago. On a present-day desktop machine this is a ten minute run. More and more large-scale simulations are run on parallel systems like networked workstations, clusters, or “the grid.” In a parallel environment the quality of a PRNG is even more important, to some extent because feasible sample sizes are easily 10, ... ,10 5 times larger than on a sequential ma- chine. The main problem is the parallelization of the PRNG itself, however. Some good generators are not parallelizable at all, others lose their efficiency, their quality, or even both when being parallelized 2,3. In this contribution we discuss linear recurrences in finite fields. They excel as sources of pseudorandomness because they have a solid mathematical foundation and they can eas- ily be parallelized. Their linear structure, however, may cause problems in experiments that sample random points in high dimensional space. This is a known issue “random numbers fall mainly in the planes” 4, but it can be over- come by a proper delinearization of the sequence. The paper is organized as follows. In Sec. II we briefly review some properties of linear recurrences in finite fields, and we motivate their use as PRNGs. In Sec. III we discuss various techniques for parallelizing a PRNG. We will see that only two of these techniques allow for parallel simula- tions whose results do not necessarily depend on the number of processes, and we will see how linear recurrences support these parallelization techniques. Section IV addresses the problem of linear sequences to sample points in high dimen- sions, measured by the spectral test. We show that with a proper transformation delinearization we can generate a new sequence that shares all nice properties with the original sequence like parallelizability, equidistribution properties, etc., but passes many statistical tests that are sensitive to the hyperplane structure of points of linear sequences in high dimensions. In the last two sections we discuss the quality of the linear and nonlinear sequences in parallel Monte Carlo applications and how to implement these generators effi- ciently. II. RECURRENT RANDOMNESS Almost all PRNGs produce a sequence r = r 1 , r 2 ,... of pseudorandom numbers by a recurrence r i = f r i-1 , r i-2 ,..., r i-k , 1 and the art of random number generation lies in the design of the function f . Numerous recipes for f have been discussed in the literature 5,6. One of the oldest and most popular PRNGs is the linear congruential generator LCG r i = ar i-1 + b mod m 2 introduced by Lehmer 7. The additive constant b may be zero. Knuth 8 proposed a generalization of Lehmer’s method known as multiple recursive generator MRG 6,9,10 r i = a 1 r i-1 + a 2 r i-2 + ¯ + a n r i-n mod m . 3 Surprisingly, on a computer all recurrences are essentially linear. This is due to the simple fact that computers operate on numbers of finite precision; let us say integers between 0 *Email address: [email protected] Email address: [email protected] PHYSICAL REVIEW E 75, 066701 2007 1539-3755/2007/756/06670114 ©2007 The American Physical Society 066701-1

Upload: others

Post on 16-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Random numbers for large-scale distributed Monte Carlo simulations · 2007. 6. 21. · Random numbers for large-scale distributed Monte Carlo simulations Heiko Bauke* Institut für

Random numbers for large-scale distributed Monte Carlo simulations

Heiko Bauke*Institut für Theoretische Physik, Universität Magdeburg, Universitätsplatz 2, 39016 Magdeburg, Germany

and Rudolf Peierls Centre for Theoretical Physics, University of Oxford, 1 Keble Road, Oxford, OX1 3NP, United Kingdom

Stephan Mertens†

Institut für Theoretische Physik, Universität Magdeburg, Universitätsplatz 2, 39016 Magdeburg, Germany�Received 19 September 2006; published 6 June 2007�

Monte Carlo simulations are one of the major tools in statistical physics, complex system science, and otherfields, and an increasing number of these simulations is run on distributed systems like clusters or grids. Thisraises the issue of generating random numbers in a parallel, distributed environment. In this contribution wedemonstrate that multiple linear recurrences in finite fields are an ideal method to produce high qualitypseudorandom numbers in sequential and parallel algorithms. Their known weakness �failure of samplingpoints in high dimensions� can be overcome by an appropriate delinearization that preserves all desirableproperties of the underlying linear sequence.

DOI: 10.1103/PhysRevE.75.066701 PACS number�s�: 02.70.�c, 02.70.Rr, 05.10.Ln

I. INTRODUCTION

The Monte Carlo method is a major industry and randomnumbers are its key resource. In contrast to most commodi-ties, quantity and quality of randomness have an inverse re-lation: The more randomness you consume, the better it hasto be �1�. The quality issue arises because simulations onlyapproximate randomness by generating a stream of determin-istic numbers, named pseudorandom numbers �PRNs�, withsuccessive calls to a pseudorandom number generator�PRNG�. Considering the ever increasing computing power,however, the quality of PRNGs is a moving target. Simula-tions that consume 1010 pseudorandom numbers were con-sidered large-scale Monte Carlo simulation a few years ago.On a present-day desktop machine this is a ten minute run.

More and more large-scale simulations are run on parallelsystems like networked workstations, clusters, or “the grid.”In a parallel environment the quality of a PRNG is evenmore important, to some extent because feasible sample sizesare easily 10, . . . ,105 times larger than on a sequential ma-chine. The main problem is the parallelization of the PRNGitself, however. Some good generators are not parallelizableat all, others lose their efficiency, their quality, or even bothwhen being parallelized �2,3�.

In this contribution we discuss linear recurrences in finitefields. They excel as sources of pseudorandomness becausethey have a solid mathematical foundation and they can eas-ily be parallelized. Their linear structure, however, maycause problems in experiments that sample random points inhigh dimensional space. This is a known issue �“randomnumbers fall mainly in the planes” �4��, but it can be over-come by a proper delinearization of the sequence.

The paper is organized as follows. In Sec. II we brieflyreview some properties of linear recurrences in finite fields,and we motivate their use as PRNGs. In Sec. III we discuss

various techniques for parallelizing a PRNG. We will seethat only two of these techniques allow for parallel simula-tions whose results do not necessarily depend on the numberof processes, and we will see how linear recurrences supportthese parallelization techniques. Section IV addresses theproblem of linear sequences to sample points in high dimen-sions, measured by the spectral test. We show that with aproper transformation �delinearization� we can generate anew sequence that shares all nice properties with the originalsequence �like parallelizability, equidistribution properties,etc.�, but passes many statistical tests that are sensitive to thehyperplane structure of points of linear sequences in highdimensions. In the last two sections we discuss the quality ofthe linear and nonlinear sequences in parallel Monte Carloapplications and how to implement these generators effi-ciently.

II. RECURRENT RANDOMNESS

Almost all PRNGs produce a sequence �r�=r1 ,r2 , . . . ofpseudorandom numbers by a recurrence

ri = f�ri−1,ri−2, . . . ,ri−k� , �1�

and the art of random number generation lies in the design ofthe function f . Numerous recipes for f have been discussedin the literature �5,6�. One of the oldest and most popularPRNGs is the linear congruential generator �LCG�

ri = ari−1 + b mod m �2�

introduced by Lehmer �7�. The additive constant b may bezero. Knuth �8� proposed a generalization of Lehmer’smethod known as multiple recursive generator �MRG��6,9,10�

ri = a1ri−1 + a2ri−2 + ¯ + anri−n mod m . �3�

Surprisingly, on a computer all recurrences are essentiallylinear. This is due to the simple fact that computers operateon numbers of finite precision; let us say integers between 0

*Email address: [email protected]†Email address: [email protected]

PHYSICAL REVIEW E 75, 066701 �2007�

1539-3755/2007/75�6�/066701�14� ©2007 The American Physical Society066701-1

Page 2: Random numbers for large-scale distributed Monte Carlo simulations · 2007. 6. 21. · Random numbers for large-scale distributed Monte Carlo simulations Heiko Bauke* Institut für

and rmax. This has two important consequences.�1� Every recurrence �1� must be periodic.�2� We can embed the sequence �r� into the finite field Fm,

the field of integers with arithmetic modulo m, where m is aprime number larger than the largest value in the sequence�r�.

The relevance of linear sequences is then based on thefollowing fact ��11�, corollary 6.2.3�:

Theorem 1. All finite length sequences and all periodicsequences over a finite field Fm can be generated by a linearrecurrence �3�.

In the theory of finite fields, a sequence of type �3� iscalled linear feedback shift register sequence, or LFSR se-quence for short. Note that a LFSR sequence is fully deter-mined by specifying n coefficients �a1 ,a2 , . . . ,an� plus n ini-tial values �r1 ,r2 , . . . ,rn�.

According to Theorem 1 all finite length sequences and allperiodic sequences are LFSR sequences. The linear complex-ity of a sequence is the minimum order of a linear recurrencethat generates this sequence. The Berlekamp-Massey algo-rithm �11,12� is an efficient procedure for finding this short-est linear recurrence. The linear complexity of a random se-quence of length T on average equals T /2 �13�. As aconsequence a random sequence cannot be compressed bycoding it as a linear recurrence, because T /2 coefficients plusT /2 initial values have to be specified. Typically the linearcomplexity of a nonlinear sequence �1� is of the same orderof magnitude as the period of the sequence. Since the design-ers of PRNGs strive for “astronomically” large periods,implementing nonlinear generators as a linear recurrence isnot tractable. As a matter of principle, however, all we needto discuss are linear recurrences, and any nonlinear recur-rence can be seen as an efficient implementation of a largeorder linear recurrence. The popular PRNG “MersenneTwister” �14�, for example, is an efficient implementation ofa linear recurrence in F2 of order 19 937.

There is a wealth of rigorous results on LFSR sequencesthat can �and should� be used to construct a good PRNG.Here we only discuss a few but important facts withoutproofs. A detailed discussion of LFSR sequences includingproofs can be found in �11,15–19�.

Since the all zero tuple �0,0,…,0� is a fixed point of Eq.�3�, the maximum period of a LFSR sequence cannot exceedmn−1. The following theorem tells us precisely how tochoose the coefficients �a1 ,a2 , . . . ,an� to achieve this period�5�.

Theorem 2. The LFSR sequence �3� over Fm has periodmn−1, if and only if the characteristic polynomial

f�x� = xn − a1xn−1 − a2xn−2 − ¯ − an �4�

is primitive modulo m.A monic polynomial f�x� of degree n over Fm is primitive

modulo m, if and only if it is irreducible �i.e., cannot befactorized over Fm�, and if it has a primitive element of theextension field Fmn as one of its roots. The number of primi-tive polynomials of degree n modulo m is equal to ��mn

−1� /n=O�mn / (n ln(n ln m))� �20�, where ��x� denotes Eul-er’s totient function. As a consequence a random polynomial

of degree n is primitive modulo m with probability�1/ (n ln�n ln m�), and finding primitive polynomials re-duces to testing whether a given polynomial is primitive. Thelatter can be done efficiently, if the factorization of mn−1 isknown �11�, and most computer algebra systems offer a pro-cedure for this.

Theorem 3. Let �r� be an LFSR sequence �3� with a primi-tive characteristic polynomial. Then each k-tuple�ri+1 , . . . ,ri+k�� �0,1 , . . . ,m−1�k occurs mn−k times per pe-riod for k�n �except the all zero tuple for k=n, which neveroccurs�.

From this theorem it follows that, if a tuple of k succes-sive terms k�n is drawn from a random position of a LFSRsequence, the outcome is uniformly distributed over all pos-sible k-tuples in Fm. This is exactly what one would expectfrom a truly random sequence. In terms of Compagner’s en-semble theory, tuples of length k�n of a LFSR sequencewith primitive characteristic polynomial are indistinguish-able from truly random tuples �21,22�.

Theorem 4. Let �r� be an LFSR sequence �3� with periodT=mn−1 and let � be a complex mth root of unity and �̄ itscomplex conjugate. Then

C�h� ª �i=1

T

�ri�̄ri+h = T if h = 0 mod T

− 1 if h � 0 mod T . �5�

C�h� can be interpreted as the autocorrelation functionof the sequence, and Theorem 4 tells us that LFSR sequenceswith maximum period have autocorrelations that are verysimilar to the autocorrelations of a random sequence withperiod T. Together with the nice equidistribution properties�Theorem 3� this qualifies LFSR sequences with maximumperiod as pseudonoise sequences, a term originally coined byGolomb for binary sequences �11,15�.

III. PARALLELIZATION

A. General parallelization techniques

In parallel applications we need to generate streams tj,i ofrandom numbers, where j=0,1 , . . . , p−1 numbers thestreams for each of the p processes. We require statisticalindependence of the tj,i within each stream and between thestreams. Four different parallelization techniques are used inpractice.

Random seeding. All processes use the same PRNG but adifferent “random” seed. The hope is that they will generatenonoverlapping and uncorrelated subsequences of the origi-nal PRNG. This hope, however, has no theoretical founda-tion. Random seeding is a violation of Donald Knuth’s ad-vice, “Random number generators should not be chosen atrandom” �5�.

Parametrization. All processes use the same type of gen-erator but with different parameters for each processor. Ex-ample: linear congruential generators with additive constantbj for the jth stream �23�

tj,i = atj,i−1 + bj mod m , �6�

where the bj’s are different prime numbers just below �m /2.Another variant uses different multipliers a for different

HEIKO BAUKE AND STEPHAN MERTENS PHYSICAL REVIEW E 75, 066701 �2007�

066701-2

Page 3: Random numbers for large-scale distributed Monte Carlo simulations · 2007. 6. 21. · Random numbers for large-scale distributed Monte Carlo simulations Heiko Bauke* Institut für

streams �24�. The theoretical foundation of these methods isweak, and empirical tests have revealed serious correlationsbetween streams �25�. On a massive parallel system you mayneed thousands of parallel streams, and it is not trivial to finda type of PRNG with thousands of “well tested” parametersets.

Block splitting. Let M be the maximum number of calls toa PRNG by each processor, and let p be the number of pro-cesses. Then we can split the sequence �r� of a sequentialPRNG into consecutive blocks of length M such that

t0,i = ri,

t1,i = ri+M ,�7�

]

tp−1,i = ri+M�p−1�.

This method works only if we know M in advance or can atleast safely estimate an upper bound for M. To apply blocksplitting it is necessary to jump from the ith random numberto the �i+M�th number without calculating the numbers in-between, which cannot be done efficiently for many PRNGs.A potential disadvantage of this method is that long rangecorrelations, usually not observed in sequential simulations,may become short range correlations between substreams�26,27�. Block splitting is illustrated in Fig. 1.

Leapfrog. The leapfrog method distributes a sequence �r�of random numbers over p processes by decimating this basesequence such that

t0,i = rpi,

t1,i = rpi+1,�8�

]

tp−1,i = rpi+�p−1�.

Leapfrogging is illustrated in Fig. 2. It is the most versatileand robust method for parallelization and it does not requirean a priori estimate of how many random numbers will beconsumed by each processor. An efficient implementationrequires a PRNG that can be modified to generate directlyonly every kth element of the original sequence. Again thisexcludes many popular PRNGs.

Note that for a periodic sequence �r� the substreams de-rived from block splitting are cyclic shifts of the originalsequence �r�, and for p not dividing the period of �r�, theleapfrog sequences are cyclic shifts of each other. Hence theleapfrog method is equivalent to block splitting on a differentbase sequence.

B. Playing fair

We say that a parallel Monte Carlo simulation plays fair,if its outcome is strictly independent of the underlying hard-ware, where strictly means deterministically, not just statis-tically. Fair play implies the use of the same PRNs in thesame context, independently of the number of parallel pro-cesses. It is mandatory for debugging, especially in parallelenvironments where the number of parallel processes variesfrom run to run, but another benefit of playing fair is evenmore important: Fair play guarantees that the quality of aPRNG with respect to an application does not depend on thedegree of parallelization.

Obviously, parametrization or random seeding as dis-cussed above prevent a simulation from playing fair. Leap-frog and block splitting, on the other hand, do allow one touse the same PRNs within the same context independently ofthe number of parallel streams.

Consider the site percolation problem. A site in a lattice ofsize N is occupied with some probability, and the occupancyis determined by a PRN. M random configurations are gen-erated. A naive parallel simulation on p processes could splita base sequence into p leapfrog streams and having eachprocess generate �M / p lattice configurations, independentlyof the other processes. Obviously, this parallel simulation isnot equivalent to its sequential version that consumes PRNsfrom the base sequence to generate one lattice configurationafter another. The effective shape of the resulting lattice con-figurations depends on the number of processes. This parallelalgorithm does not play fair.

We can turn the site percolation simulation into a fairplaying algorithm by leapfrogging on the level of lattice con-figurations. Here each process consumes distinct contiguousblocks of PRNs from the sequence �r�, and the workload isspread over p processors in such a way, that each processanalyzes each pth lattice. If we number the processes by theirrank i from 0 to p−1 and the lattices from 0 to M −1, eachprocess starts with a lattice whose number equals its ownrank. That means process i has to skip iN PRNs before thefirst lattice configuration is generated. Thereafter each pro-cess can skip p−1 lattices, i.e., �p−1�N PRNs and continuewith the next lattice.

ri

t

t

t

i

1, i

i

0,

2,

FIG. 1. Parallelization by block splitting.

r

t

t

t

i

2,

0,

1,

i

i

i

FIG. 2. Parallelization by leapfrogging.

RANDOM NUMBERS FOR LARGE-SCALE DISTRIBUTED… PHYSICAL REVIEW E 75, 066701 �2007�

066701-3

Page 4: Random numbers for large-scale distributed Monte Carlo simulations · 2007. 6. 21. · Random numbers for large-scale distributed Monte Carlo simulations Heiko Bauke* Institut für

Organizing simulation algorithms such that they play fairis not always as easy as in the above example, but with alittle effort one can achieve fair play in more complicatedsituations, too. This may require the combination of blocksplitting and the leapfrog method, or iterated leapfrogging.Sometimes it is also necessary to use more than one streamof PRNs per process, e.g., in the Swendsen Wang clusteralgorithm one may use one PRNG to construct the bondpercolation clusters and another PRNG to decide to flip theclusters.

C. Parallelization of LFSR sequences

As a matter of fact, LFSR sequences do support leapfrogand block splitting very well. Block splitting means basicallyjumping ahead in a PRN sequence. In the case of LFSRsequences this can be done quite efficiently. Note, that byintroducing a companion matrix A the linear recurrence �3�can be written as a vector matrix product �28�.

�9�

From this formula it follows immediately that the M-foldsuccessive iteration of Eq. �3� may be written as

ri−�n−1�

]

ri−1

ri

� = AM ri−M−�n−1�

]

ri−M−1

ri−M

� mod m . �10�

Matrix exponentiation can be accomplished in O�n3 ln M�steps via binary exponentiation �also known as exponentia-tion by squaring�.

Implementing leapfrogging efficiently is less straightfor-ward. Calculating tj,i=rpi+j via

rpi+j−�n−1�

]

rpi+j−1

rpi+j

� = Ap rp�i−1�+j−�n−1�

]

rp�i−1�+j−1

rp�i−1�+j

� mod m �11�

is no option, because Ap is usually a dense matrix, in whichcase calculating a new element from the leapfrog sequencerequires O�n2� operations instead of O�n� operations in thebase sequence.

The following theorem assures that the leapfrog subse-quences of LFSR sequences are again LFSR sequences �11�.This will provide us with a very efficient way to generateleapfrog sequences.

Theorem 5. Let �r� be a LFSR sequence based on a primi-tive polynomial of degree n with period mn−1 �pseudonoisesequence� over Fm, and let �t� be the decimated sequencewith lag p�0 and offset j, e.g.,

tj,i = rpi+j . �12�

Then �tj� is a LFSR sequence based on a primitive polyno-mial of degree n, too, if and only if p and mn−1 are coprime,e.g., greatest common divisor �gcd��mn−1, p�=1. In addi-tion, �r� and �tj� are not just cyclic shifts of each other, ex-cept when

p = mh �13�

for some 0�h�n. If gcd�mn−1, p��1 the sequence �tj� isstill a LFSR sequence, but not a pseudonoise sequence.

It is not hard to find prime numbers m such that mn−1 hasvery few �and large� prime factors. For such numbers, theleapfrog method yields pseudonoise sequences for any rea-sonable number of parallel streams �see also Sec. III D andthe Appendix�.

While Theorem 5 ensures that leapfrog sequences are notjust cyclic shifts of the base sequence �unless Eq. �13� holds�,the leapfrog sequences are cyclic shifts of each other. This istrue for general sequences, not just LFSR sequences. Con-sider an arbitrary sequence �r� with period T. If gcd�T , p�=1, all leapfrog sequences �t1�, �t2� , . . . , �tp� are cyclic shiftsof each other, i.e., for every pair of leapfrog sequences �tj1

�and �tj2

� of a common base sequence �r� with period T thereis a constant s, such that tj1,i= tj2,i+s for all i, and s is at least�T / p�. Furthermore, if gcd�T , p�=d�1, the period of eachleapfrog sequence equals T /d and there are d classes of leap-frog sequences. Within a class of leapfrog sequences thereare p /d sequences, each sequence is just a cyclic shift ofanother and the size of the shift is at least �T / p�.

Theorem 5 tells us that all leapfrog sequences of a LFSRsequence of degree n can be generated by another LFSR ofdegree n or less. The following theorem ��11�, Theorem6.6.1� gives us a recipe to calculate the coefficients�b1 ,b2 , . . . ,bn� of the corresponding leapfrog feedback poly-nomial.

Theorem 6. Let �t� be a �periodic� LFSR sequence overthe field Fm and f�x� its characteristic polynomial of degreen. Then the coefficients �b1 ,b2 , . . . ,bn� of f�x� can be com-puted from 2n successive elements of �t� by solving the lin-ear system

ti+n

ti+n+1

]

ti+2n−1

� = ti+n−1 . . . ti+1 ti

ti+n . . . ti+2 ti+1

] � ] ]

ti+2n−2 . . . ti+n ti+n−1

� b1

b2

]

bn

� �14�

over Fm.Starting from the base sequence we determine 2n values

of the sequence �t� by applying the leapfrog rule. Then wesolve Eq. �14� by Gaussian elimination to get the character-istic polynomial for a new LFSR generator that yields theelements of the leapfrog sequence directly with each call. Ifthe matrix in Eq. �14� is singular, the linear system has morethan one solution, and it is sufficient to pick one of them. Inthis case it is always possible to generate the leapfrog se-quence by a LFSR of degree less than the degree of theoriginal sequence.

HEIKO BAUKE AND STEPHAN MERTENS PHYSICAL REVIEW E 75, 066701 �2007�

066701-4

Page 5: Random numbers for large-scale distributed Monte Carlo simulations · 2007. 6. 21. · Random numbers for large-scale distributed Monte Carlo simulations Heiko Bauke* Institut für

D. Choice of modulus

LFSR sequences over F2 with sparse feedback polynomi-als are popular sources of PRNs �5,29,30� and generators ofthis type can be found in various software libraries. This isdue to the fact that multiplication in F2 is trivial, additionreduces to exclusive-OR and the mod operation comes forfree. As a result, generators that operate in F2 are extremelyfast. Unfortunately, these generators suffer from serious sta-tistical defects �30–33� that can be blamed quite generally onthe small size of the underlying field �34� �see also Sec.VI A�. In parallel applications we have the additional draw-back, that, if the leapfrog method is applied to a LFSR se-quence with sparse characteristic polynomial, the new se-quence will have a dense polynomial. The computationalcomplexity of generating values of the LFSR sequencegrows from O�1� to O�n�. Remember that for generators inF2, n is typically of order 1000 or even larger to get a longperiod 2n−1.

The theorems and parallelization techniques we have pre-sented so far do apply to LFSR sequences over any finitefield Fm. Therefore we are free to choose the prime modulusm. In order to get maximum entropy on the macrostate level�35� m should be as large as possible. A good choice is to setm to a value that is of the order of the largest representableinteger of the computer. If the computer deals with e-bitregisters, we may write the modulus as m=2e−k, with kreasonably small. In fact if k�k+2��m modular reductioncan be done reasonably fast by a few bit-shifts, additions,and multiplications �see Sec. V A�. Furthermore a largemodulus allows us to restrict the degree of the LFSR torather small values, e.g., n�4, while the PRNG has a largeperiod and good statistical properties.

In accordance with Theorem 5 a leapfrog sequence of apseudonoise sequence is a pseudonoise sequence, too, if andonly if its period mn−1 and the lag p are coprime. For thatreason mn−1 should have a small number of prime factors�36�. It can be shown that mn−1 has at least three primefactors and if the number of prime factors does not exceedthree, then m is necessarily a Sophie-Germain prime and n aprime larger than two �see the Appendix for details�.

To sum up, the modulus m of a LFSR sequence should bea Sophie-Germain prime, such that mn−1 has no more thanthree prime factors and such that m=2e−k and k�k+2��mfor some integers e and k.

IV. DELINEARIZATION

LFSR sequences over prime fields with a large primemodulus seem to be ideally suited as PRNGs, especially inparallel environments. They have, however, a well knownweakness. When used to sample coordinates ind-dimensional space, pseudonoise sequences cover everypoint for d�n, and every point except �0,0,…,0� for d=n.For d�n the set of positions generated is obviously sparse,and the linearity of the production rule �3� leads to the con-centration of the sampling points on n-dimensional hyper-planes �37,38� �see also the top of Fig. 3�. This phenomenon,first noticed by Marsaglia in 1968 �4�, motivates one of thewell known tests for PRNGs, the so-called spectral test �5�.

The spectral test checks the behavior of a generator when itsoutputs are used to form d tuples. LFSR sequences can failthe spectral test in dimensions larger than the register size n.Closely related to this mechanism are the observed correla-tions in other empirical tests such as the birthday spacingstest and the collision test �39,40�. Nonlinear generators doquite well in all these tests, but compared to LFSR sequencesthey have much less nice and provable properties and theyare not suited for fair playing parallelization.

To get the best of both worlds we propose a delineariza-tion that preserves all the nice properties of linear pseud-onoise sequences:

Theorem 7. Let �q� be a pseudonoise sequence in Fm, and

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

ri/1999

r i+1/1

999

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

ri/1999

r i+1/1

999

(a)

(b)

FIG. 3. Exponentiation of a generating element in a prime fieldis an effective way to destroy the linear structures of LFSR se-quences. Both pictures show the full period of the generator. Top:ri=95ri−i mod 1999. Bottom: ri=1099qi mod 1999 with qi

=95qi−i mod 1999.

RANDOM NUMBERS FOR LARGE-SCALE DISTRIBUTED… PHYSICAL REVIEW E 75, 066701 �2007�

066701-5

Page 6: Random numbers for large-scale distributed Monte Carlo simulations · 2007. 6. 21. · Random numbers for large-scale distributed Monte Carlo simulations Heiko Bauke* Institut für

let g be a generating element of the multiplicative group Fm* .

Then the sequence �r� with

ri = gqi mod m if qi � 0

0 if qi = 0 �15�

is a pseudonoise sequence, too.The proof of this theorem is trivial: since g is a generator

of Fm* , the map �15� is bijective. We call delinearized genera-

tors based on Theorem 7 YARN generators �yet another ran-dom number�.

The linearity is completely destroyed by the map �15� �seeFig. 3�. Let L�r��l� denote the linear complexity of the subse-quence �r1 ,r2 , . . . ,rl�. This function is known as the linearcomplexity profile of �r�. For a truly random sequence itgrows on average like l /2. Figure 4 shows the linear com-plexity profile L�r��l� of a typical YARN sequence. It showsthe same growth rate as a truly random sequence up to thepoint where more than 99% of the period has been consid-ered. Sharing the linear complexity profile with a truly ran-dom sequence, we may say that the YARN generator is asnonlinear as it can get.

V. IMPLEMENTATION AND EFFICIENCY

LFSR sequences over prime fields larger than F2 havebeen proposed as PRNGs in the literature �5,28,37�. An effi-cient implementation of these recurrences requires somecare, however.

We assume that all integer arithmetic is done in w-bitregisters and m�2w−1. Under this condition addition modulom can be done without overflow problems. But multiplyingtwo �w−1�-bit integers modulo m is not straightforward be-cause the intermediate product has 2�w−1� significant bitsand cannot be stored in a w-bit register. For the special caseak��m Schrage �41� showed how to calculate akri−k mod mwithout overflow. Based on this technique a portable imple-

mentation of LFSR sequences with coefficients ak��m ispresented in �10�. For parallel PRNGs this method does notapply because the leapfrog method may yield coefficientsthat violate this condition. Knuth ��5�, Sec. 3.2.1.1� proposeda generalization of Schrage’s method for arbitrary positivefactors less than m, but this method requires up to twelvemultiplications and divisions and is therefore not very effi-cient.

The only way to implement Eq. �3� without additionalmeasures to circumvent overflow problems is to restrict m tom�2w/2. On machines with 32-bit registers, 16 random bitsper number is not enough for some applications. Fortunatelytoday’s C compilers provide fast 64-bit arithmetic even on32-bit CPUs and genuine 64-bit CPUs become more andmore common. This allows us to increase m to 32.

A. Efficient modular reduction

Since the modulo operation in Eq. �3� is usually slowerthan other integer operations such as addition, multiplication,boolean operations, or shifting, it has a significant impact onthe total performance of PRNGs based on LFSR sequences.If the modulus is a Mersenne prime m=2e−1, however, themodulo operation can be done using only a few additions,boolean operations, and shift operations �42�.

A summand s=akri−k in Eq. �3� will never exceed �m−1�2= �2e−2�2 and for each positive integer s� �0, �2e−1�2�there is a unique decomposition of s into

s = r2e + q with 0 � q � 2e. �16�

From this decomposition we conclude

s − r2e = q ,

s − r�2e − 1� = q + r ,

s mod�2e − 1� = q + r mod�2e − 1� ,

and r and q are bounded from above by

q � 2e and r � ��2e − 2�2/2e� � 2e − 2,

respectively, and therefore

q + r � 2e + 2e − 2 = 2m .

So if m=2e−1 and s� �m−1�2, x=s mod m can be calculatedsolely by shift operations, boolean operations, and addition,viz.,

x = �s mod 2e� + �s/2e� . �17�

If Eq. �17� yields a value x�m we simply subtract m.From a computational point of view, Mersenne prime

moduli are optimal and we propose to choose the modulusm=231−1. This is the largest positive integer that can berepresented by a signed 32-bit integer variable, and it is alsoa Mersenne prime. On the other hand, our theoretical consid-erations favor Sophie-Germain prime moduli, for which Eq.�17� does not apply directly. But one can generalize Eq. �17�to moduli 2e−k �43�. Again we start from a decomposition ofs into

0 0.2 0.4 0.6 0.8 1 1.20

0.1

0.2

0.3

0.4

0.5

l/T

L(r

)(l)/

T

l/(2T)linear complexity profile

FIG. 4. Linear complexity profile L�r��l� of a YARN sequence,produced by the recurrence qi=173qi−1+219qi−2 mod 317 and ri

=151qi mod 317. The period of this sequence equals T=3172−1.

HEIKO BAUKE AND STEPHAN MERTENS PHYSICAL REVIEW E 75, 066701 �2007�

066701-6

Page 7: Random numbers for large-scale distributed Monte Carlo simulations · 2007. 6. 21. · Random numbers for large-scale distributed Monte Carlo simulations Heiko Bauke* Institut für

s = r2e + q with 0 � q � 2e, �18�

and conclude

s − r2e = q ,

s − r�2e − k� = q + kr ,

s mod�2e − k� = q + kr mod�2e − k� .

The sum s�=q+kr exceeds the modulus at most by a factork+1, because by applying

q � 2e and r � ��2e − k − 1�2/2e� � 2e − k − 1,

we get the bound

q + kr � 2e + k�2e − k − 1� = �k + 1�m .

In addition, by the decomposition of s�=q+kr,

s� = r�2e + q� with 0 � q� � 2e,

it follows

s mod�2e − k� = s� mod�2e − k� = q� + kr� mod�2e − k� ,

and this time the bounds

q� � 2e and r� � ��k + 1��2e − k�/2e� � k + 1

and

q� + kr� � 2e + k�k + 1� = m + k�k + 2�

hold. Therefore if m=2e−k, s� �m−k�2 and k�k+2��m, x=s mod m can be calculated solely by shift operations, bool-ean operations, and addition, viz.,

s� = �s mod 2e� + k�s/2e� ,

x = �s� mod 2e� + k�s�/2e� . �19�

If Eq. �19� yields a value x�m, a single subtraction of m willcomplete the modular reduction. To carry out Eq. �19�, twiceas many operations as for Eq. �17� are needed, but Eq. �19�applies for all moduli m=2e−k with k�k+2��m. Note thaton systems with big enough word size �such as 64-bit archi-tectures�, just doing the mod operation might well be fasterthan using Eq. �19�.

B. Fast delinearization

YARN generators hide linear structures of LFSR se-quences �q� by raising a generating element g to the powergqi mod m. This can be done efficiently by binary exponen-tiation, which takes O�log m� steps. But considering LFSRsequences with only a few feedback taps �n�6� and m�231 even fast exponentiation is significantly more expen-sive than a single iteration of Eq. �3�. Therefore we proposeto implement exponentiation by table lookup. If m is a2e�-bit number we apply the decomposition

qi = qi,12e� + qi,0

with

qi,1 = �qi/2e��, qi,0 = qi mod 2e�, �20�

and use the identity

ri = gqi mod m = �g2e��qi,1gqi,0 mod m �21�

to calculate gqi mod m by two table lookups and one multi-

plication modulo m. If m�231 the tables for �g2e��qi,1 mod m

and gqi,0 mod m have 216 and 215 entries, respectively. These384 kbytes of data easily fit into the cache of modern CPUs.

C. TRNG

LFSR sequences and their nonlinear counterparts havebeen implemented in the TRNG software package, a portableand highly optimized library of parallelizable PRNGs. Paral-lelization by block splitting as well as by leapfrogging aresupported by all generators. TRNG is publicly available fordownload �44�. Its design is based on a proposal for the nextrevision of the ISO C++ standard �48�. TRNG uses 64-bitarithmetic, fast modular reduction �17� and �19�, and expo-nentiation by table lookup �21� to implement PRNGs basedon LFSR sequences over prime fields, with Mersenne orSophie-Germain prime modulus. Table I describes the gen-erators of the software package.

Table II shows some benchmark results. Apparently theperformance of both types of PRNGs competes well withpopular PRNGs such as the Mersenne Twister or RANLUX.Absolute as well as relative timings in Table II depend oncompiler and CPU architecture, but the table gives a roughperformance measure of our PRNGs.

VI. QUALITY

It is not possible to prove that a given PRNG will workwell in any simulation, but one can subject a PRNG to abattery of tests that mimic typical applications. One distin-guishes empirical and theoretical test procedures. Empiricaltests take a finite sequence of PRNs and compute certainstatistics to judge the generator as “random” or not. Whileempirical tests focus only on the statistical properties of afinite stream of PRNs and ignore all the details of the under-lying PRNG algorithm, theoretical tests analyze the PRNGalgorithm itself by number-theoretic methods and establish

TABLE I. Generators of the TRNG library. All PRNGs denotedby trng�mrg n�s� are pseudonoise sequences over Fm with n feed-back terms, trng�yarn n�s� denotes its delinearized counterpart.The prime modulus 231−1 is a Mersenne prime, whereas all othermoduli are Sophie-Germain primes.

Name m Period

trng�mrg2, trng�yarn2 231−1 �262

trng�mrg3, trng�yarn3 231−1 �293

trng�mrg3s, trng�yarn3s 231−21 069 �293

trng�mrg4, trng�yarn4 231−1 �2124

trng�mrg5, trng�yarn5 231−1 �2155

trng�yarn5s, trng�yarn5s 231−22 641 �2155

RANDOM NUMBERS FOR LARGE-SCALE DISTRIBUTED… PHYSICAL REVIEW E 75, 066701 �2007�

066701-7

Page 8: Random numbers for large-scale distributed Monte Carlo simulations · 2007. 6. 21. · Random numbers for large-scale distributed Monte Carlo simulations Heiko Bauke* Institut für

a priori characteristics of the PRN sequence. These a prioricharacteristics may be used to choose good parameter setsfor certain classes of PRNGs. The parameters of the LFSRsequences used in TRNG, for example, were found by exten-sive computer search �38� to give good results in the spectraltest �5�.

In this section we apply statistical tests to the generatorsshown in Table I to investigate their statistical properties andthose of its leapfrog substreams. The test procedures are mo-tivated by problems that occur frequently in physics, and thatare known as sensitive tests for PRNGs, namely, the clusterMonte Carlo simulation of the two-dimensional Ising modeland random walks. Furthermore we analyze the effect of theexponentiation step of the YARN generator in the birthdayspacings test. Results of additional empirical tests are pre-sented in the documentation of the software library trng �44�.

A. Cluster Monte Carlo simulations

In �33� it is reported that some “high quality” generatorsproduce systematically wrong results in a simulation of thetwo-dimensional Ising model at a critical temperature of theinfinite system by the Wolff cluster flipping algorithm �49�.Deviations are due to the bad mixing properties of the gen-erators �low macrostate entropy� �35�. An infamous examplefor such a bad PRNG is the r250 generator �29�. It combines

e LFSR sequences over F2 to produce a stream of e-bit inte-gers via

ri = ri−103 � ri−250, �22�

where � denotes the bitwise exclusive-OR operation �addi-tion in F2�.

Figure 5 shows the results of cluster Monte Carlo simu-lations of the Ising model on a 16�16 square lattice withcyclic boundary conditions for the generator r250 and itsleapfrog substreams. Each simulation was done ten times atthe critical temperature applying the Wolff cluster algorithm.The mean of the internal energy EMC, the specific heat cMC,and the empirical standard deviation of both quantities havebeen measured. The gray scale in Fig. 5 codes the deviationfrom the exact values �50�. Black bars indicate deviationslarger than four standard deviations from the exact values.The original r250 sequence and its leapfrog sequences yieldbad results if the number of processes is a power of two.Since the statistical error decreases as the number of MonteCarlo updates increases, these systematic deviations becomemore and more significant and a PRNG with bad statisticalproperties will reveal its defects. Figure 5 visualizes the in-verse relation between quantity and quality of streams ofPRNs mentioned in the Introduction.

The apparent dependency of the quality on the leapfrogparameter p is easily understood: If p is a power of two, theleapfrog sequence is just a shifted version of the originalsequence �15,30�. On the other hand, if p is not a power oftwo, the leapfrog sequence corresponds to a LFSR sequencewith a denser characteristic polynomial. For p=3, e. g., therecursion reads

ri = ri−103 � ri−152 � ri−201 � ri−250,

and for p=7,

ri = ri−103 � ri−124 � ri−145 � ri−166 � ri−187 � ri−208 � ri−229

� ri−250,

respectively. The quality of a LFSR generator usually in-creases with the number of nonzero coefficients in its char-acteristic polynomial �9,51,52�, a phenomenon that can beeasily explained theoretically �34�, and that also holds forLFSR sequences over general prime fields �35�.

While LFSR sequences over F2 with sparse characteristicpolynomial are known to fail in cluster Monte Carlo simula-tions, LFSR sequences over large prime fields with densecharacteristic polynomial perform very well in these tests.This can be seen in the lower half of Fig. 5 for thetrng�mrg3s generator; other generators from Table I per-form likewise.

B. Random walks

The simulation of the two-dimensional Ising model in theprevious section measured the quality within distinct sub-streams. Potential interstream correlations are not taken intoaccount. In �53� some tests are proposed that are designed todetect correlations between substreams. One of them is theso-called SN test.

TABLE II. Performance of various PRNGs from the TRNG li-brary version 4.0 �44� and the GNU Scientific Library �GSL� ver-sion 1.8 �45�. The test program was compiled using the Intel C++compiler version 9.1 with high optimization �option −03� and wasexecuted on a Pentium IV 3 GHz.

TRNG GSL

Generator PRNs per second Generator PRNs per second

trng�mrg2 48�106 mt19937a 41�106

trng�mrg3 43�106 r250b 85�106

trng�mrg3s 36�106 gfsr4c 77�106

trng�mrg4 38�106 ran2d 27�106

trng�mrg5 38�106 ranluxe 8�106

trng�mrg5s 23�106 ranlux389f 5�106

trng�yarn2 26�106

trng�yarn3 23�106

trng�yarn3s 16�106

trng�yarn4 20�106

trng�yarn5 22�106

trng�yarn5s 13�106

aMersenne twister �14�.bShift-register sequence over F2, ri=ri−103 � ri−250 �29� �see alsoSec. VI A�.cShift-register sequence over F2, ri=ri−471 � ri−1586 � ri−6988

� ri−9689 �30�.dNumerical recipes generator ran2 �46�.eRANLUX generator with default luxury level �47�.fRANLUX generator with highest luxury level �47�.

HEIKO BAUKE AND STEPHAN MERTENS PHYSICAL REVIEW E 75, 066701 �2007�

066701-8

Page 9: Random numbers for large-scale distributed Monte Carlo simulations · 2007. 6. 21. · Random numbers for large-scale distributed Monte Carlo simulations Heiko Bauke* Institut für

For the SN test N random walkers are considered. Theymove simultaneously and independently on the one-dimensional lattice, i.e., on the set of all integers. At eachtime step t each walker moves one step to the left or to theright with equal probability. For large times t the expectednumber SN�t� of sites that have been visited by at least onewalker reaches the asymptotic form SN�t�� t with =1/2.In the SN test the exponent is measured and compared tothe asymptotic value =1/2.

Let p be the number of parallel streams or walkers. Themotion of each random walker is determined by a leapfrogstream ti,j based on the generators from Table I. The expo-nent is determined for up to 16 simultaneous random walk-ers as a function of time t. After about 1000 time steps theexponent has converged and fluctuates around its meanvalue.

Figure 6 shows some typical plots of the exponent against time t for generator trng�yarn3. The exponent wasdetermined by averaging over 108 random walks. Results forthe other generators in Table I look similar and are omittedhere. Figure 7 presents the asymptotic values of as a func-tion of the number of walkers for generator trng�mrg5. Themean value of averaged over the interval 1500� t�2500is plotted and the error bars indicate the maximum and theminimum value of in 1500� t�2500. Almost all numeri-cal results are in good agreement with the theoretical predic-tion =1/2 and for other generators we get qualitatively the

same results. Therefore we conclude that there are no inter-stream correlations detectable by the SN test.

C. Birthday spacings test

The birthday spacings test is an empirical test procedurethat was proposed by Marsaglia �5,54�. It is specifically de-signed to detect the hyperplane structures of LFSR se-quences �39�.

For the birthday spacings test N days 0�di�M are uni-formly and randomly chosen in a year of M days. After sort-ing these independent identically distributed random num-bers into nondecreasing order b�1� ,b�2� , . . . ,b�N�, N spacings

s1 = b�2� − b�1�,

s2 = b�3� − b�2�,

]

sN = b�1� + M − b�N�

are defined. The birthday spacings test examines the distri-bution of these spacings by sorting them into nondecreasingorder s�1� ,s�2� , . . . ,s�N� and counts the number R of equalspacings. More precisely, R is the number of indices j suchthat 1� j�N and s�j�=s�j+1�. The test statistics R is a randomnumber with distribution

2 4 6 8 10 12 14 16

1

3

10

30

100

300

thou

sand

MC

upda

tes

p

EMC

cMC

EMC

cMC

EMC

cMC

EMC

cMC

EMC

cMC

EMC

cMC

10 20 30 40 50 60

0.1

0.3

1

3

10

30

mill

ion

MC

upda

tes

p

EMC

cMC

EMC

cMC

EMC

cMC

EMC

cMC

EMC

cMC

EMC

cMC

10 20 30 40 50 60

0.1

0.3

1

3

10

30

mill

ion

MC

upda

tes

p

EMC

cMC

EMC

cMC

EMC

cMC

EMC

cMC

EMC

cMC

EMC

cMC

10 20 30 40 50 60

0.1

0.3

1

3

10

30

mill

ion

MC

upda

tes

p

EMC

cMC

EMC

cMC

EMC

cMC

EMC

cMC

EMC

cMC

EMC

cMC

0 1 2 3 4|E

MC− E

exact|/∆E

MCand |c

MC− c

exact|/∆c

MC

(a)

(b)

FIG. 5. Results of Monte Carlo simulations ofthe two-dimensional Ising model at the criticaltemperature using the Wolff cluster flipping algo-rithm and the generator r250 �top� andtrng�mrg3s �bottom�. Only each pth randomnumber of the sequence was used during thesimulations. See the text for details.

RANDOM NUMBERS FOR LARGE-SCALE DISTRIBUTED… PHYSICAL REVIEW E 75, 066701 �2007�

066701-9

Page 10: Random numbers for large-scale distributed Monte Carlo simulations · 2007. 6. 21. · Random numbers for large-scale distributed Monte Carlo simulations Heiko Bauke* Institut für

pR�r� =�N3/�4M��r

eN3/�4M�r!+ O� 1

N� . �23�

For N→ and M→ such that N3 / �4M�→�=const, pR�r�converges to a Poisson distribution with mean �. After re-peating the birthday spacings test several times one can ap-ply a chi-square test to compare the empirical distribution ofR to the correct distribution �23�. If the p value of the chi-square test is very close to one or zero, the birthday spacingstest has to be considered failed �5�.

In order to make the birthday spacings test sensitive to thehyperplane structure of linear sequences, the birthdays arearranged in a d-dimensional cube of length l, i.e., M = ld.Each day bi is determined by a d tuple of consecutive PRNs�rdi ,rdi+1 , . . . ,rdi+d−1�,

bi = �j=0

d−1 � lrdi+j

m �lj . �24�

This bijective mapping transforms points in a d-dimensionalspace �0,1 , . . . , l−1�d to the linear space �0,1 , . . . ,M −1� andpoints on regular hyperplanes are transformed into regularspacings between points in �0,1 , . . . ,M −1�.

To demonstrate the failure of LFSR sequences in thebirthday spacings test we applied the test to the LFSR se-quence

ri = 17 384ri−1 + 12 391ri−2 mod 65 521, �25�

and its YARN counterpart

ri = 20 009qi mod 65 521 if qi � 0

0 if qi = 0, �26�

with

qi = 17 384qi−1 + 12 391qi−2 mod 65 521,

with the test parameters �=N3 / �4M��1 and d=6. Beyond acertain value of M the LFSR sequence starts to fail the birth-day spacings test due to its hyperplane structure �see Fig 8�.The hyperplane structure of LFSR sequences give rise tobirthday spacings that are much more regular than randomlychosen birthdays. As a consequence, we observe a value Rthat is too large �see the top of Fig. 8�.

The nonlinear transformation �15� in the YARN sequencesdestroys the hyperplane structure. In fact, even those YARNsequences whose underlying linear part fails the birthdayspacings test, passes the same test with flying colors �Fig. 8�.YARN generators start to fail the birthday spacings test only

0 500 1000 1500 2000 25000.496

0.497

0.498

0.499

0.5

0.501

0.502

0.503

0.504

t

γ t

2 walkers4 walkers8 walkers16 walkers

FIG. 6. The number of sites visited by N ran-dom walkers after t time steps is SN�t�� t. Theplot shows the exponent against time for gen-erator trng �yarn3.

2 4 6 8 10 12 14 160.499

0.4995

0.5

0.5005

0.501

p

γ ∞

FIG. 7. Results of the SN test for generatortrng�mrg5; corresponding results for other gen-erators in Table I look similar. See the text fordetails.

HEIKO BAUKE AND STEPHAN MERTENS PHYSICAL REVIEW E 75, 066701 �2007�

066701-10

Page 11: Random numbers for large-scale distributed Monte Carlo simulations · 2007. 6. 21. · Random numbers for large-scale distributed Monte Carlo simulations Heiko Bauke* Institut für

for trivial reasons, i.e., when M gets close to the period ofthe generator. For the experiment shown in Fig. 8 we aver-aged over 5000 realizations of the M birthdays, and eachbirthday was determined by d=6 random numbers. Then weexpect that the YARN generator starts to fail if M6�5000�65 5212−1, i.e., M�217.

Of course the period of the PRNGs �25� and �26� is arti-ficially small and the birthday spacings test was carried outwith these particular PRNGs to make the effect of hyper-plane structures and its elimination by delinearization as ex-plicit as possible. “Industrially sized” LFSR and YARN gen-erators like those in Table I share the same structuralfeatures, however. So the results for PRNGs �25� and �26�can be assumed to present generic results for LFSR andYARN sequences, respectively.

VII. CONCLUSIONS

We have reviewed the problem of generating pseudoran-dom numbers in parallel environments. Theoretical and prac-

tical considerations suggest to focus on linear recurrences inprime number fields, also known as LFSR sequences. Thesesequences can be efficiently parallelized by block splittingand leapfrogging, and both methods enable Monte Carlosimulations to play fair, i.e., to yield results that are indepen-dent of the degree of parallelization. We also discussed the-oretical and practical criteria for the choice of parameters inLFSR sequences. We then introduced YARN sequences,which are derived from LFSR sequences by a bijective non-linear mapping. A YARN sequence inherits the pseudonoiseproperties from its underlying LFSR sequence, but its linearcomplexity is that of a true random sequence. YARN se-quences share all the advantages of LFSR sequences, butthey pass all tests that LFSR sequences tend to fail due totheir low linear complexity.

All our results have been incorporated into TRNG, a pub-licly available software library of portable, parallelizablepseudorandom number generators. TRNG complies with theproposal for the next revision of the ISO C++ standard �48�.

8 10 12 14 16 18 20 22 2410

−2

100

102

104

106

108

log2

M

<R

>/λ

LFSR sequenceYARN sequence

8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24log

2M

LFSR sequence

8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24log

2M

YARN sequence

0 0.2 0.4 0.6 0.8 1p

(a)

(b)

FIG. 8. Due to their lattice structure, LFSRgenerators fail the birthday spacings test. Thenonlinear mapping of YARN sequences is an ef-fective technique to destroy these lattice struc-tures. Top: mean number �R� of equal spacingsbetween M randomly chosen birthdays. For agood PRNG �R� /�=1 is expected. Bottom: graycoded p value of a chi-square test for the distri-bution of R.

RANDOM NUMBERS FOR LARGE-SCALE DISTRIBUTED… PHYSICAL REVIEW E 75, 066701 �2007�

066701-11

Page 12: Random numbers for large-scale distributed Monte Carlo simulations · 2007. 6. 21. · Random numbers for large-scale distributed Monte Carlo simulations Heiko Bauke* Institut für

ACKNOWLEDGMENTS

This work was sponsored by the European Community’sFP6 Information Society Technologies program under Con-tract No. IST-001935, EVERGROW, and by the German Sci-ence Council DFG under Grant No. ME2044/1-1.

APPENDIX: SOPHIE-GERMAIN PRIME MODULI

The maximal period of a LFSR sequence �3� over theprime field Fm is given by mn−1. We are looking for primesm such that mn−1 has for a fixed n a small number of primefactors.

For m�2 the factor m−1 is even and the smallest pos-sible number of prime factors of mn−1 is three.

mn − 1 = 2m − 1

2�1 + m + m2 + ¯ + mm−1� . �A1�

If �m−1� /2 is also prime, m is called a Sophie-Germainprime or safe prime. For m=2 there exist values for n suchthat 2n−1 is also prime. Primes of form 2n−1 are calledMersenne primes, e.g., 22−1 and 286 243−1 are prime.

If n is composite and m�2 the period T is a product of atleast four prime factors.

mkl − 1 = �mk − 1���mk�l−1 + �mk�l−2 + ¯ + 1�

= �m − 1��mk−1 + mk−2 + ¯ + 1�

���mk�l−1 + �mk�l−2 + ¯ + 1� . �A2�

Actually, the number of prime factors for composite n iseven larger. The factorization of mn−1 is related to the fac-torization of the polynomial

fn�x� = xn − 1 �A3�

over Z. This polynomial can be factored into cyclotomicpolynomials �k�x� �55�,

fn�x� = xn − 1 = �d�n

�d�x� . �A4�

The coefficients of the cyclotomic polynomials are all in Z;so the number of prime factors of the factorization of mn

−1 is bounded from below by the number of cyclotomicpolynomials in the factorization of xn−1. This number equalsthe number of natural numbers that divide n. If n is primethen fn�x� is just the product �1�x��n�x�.

If m is a prime larger than two, from �1�x�=x−1 it fol-lows that mn−1 can be factorized into at least three factors,namely,

mn − 1 = 2m − 1

2 �d�1;d�n

�d�m� . �A5�

The period of a maximal period LFSR sequence with linearcomplexity n over Fm is a product of exactly three factors, ifand only if m is a Sophie-Germain prime, n is prime, and�n�m� is prime also. Note, �2�m�=m+1 is never prime, if mis an odd prime. Let us investigate some special cases inmore detail.

Case n=2. The period m2−1 is a product of 23�3 and atleast two other factors. Using the sieve of Eratosthenesmodulo 12 it can be shown, that each large enough prime mcan be written as m=12k+c, where k and c are integers suchthat gcd�12,c�=1. Factoring the period m2−1= �12k+c�2−1we find

�12k + 1�2 − 1 = 23 � 3k�6k + 1�

�12k + 5�2 − 1 = 23 � 3�2k + 1��3k + 1��A6�

�12k + 7�2 − 1 = 23 � 3�3k + 2��2k + 1�

�12k + 11�2 − 1 = 23 � 3�k + 1��6k + 5� .

Case n=4. The period m4−1 is a product of 24�3�5and at least three other factors. Using the sieve of Eras-tosthenes modulo 60 it can be shown, that each large enoughprime m can be written as m=60k+c, where k and c areintegers such that gcd�60,c�=1. Factoring the period m4−1= �60k+c�2−1 we find

TABLE III. A collection of Sophie-Germain primes m for whichmn−1 has a minimal number of prime factors.

n m

1 231−525

231−69

263−5781

263−4569

2 231−37 485

231−2085

263−927 861

263−156 981

3 231−43 725

231−21 069

263−275 025

263−21 129

4 231−305 829

231−119 565

263−3 228 621

263−156 981

5 231−46 365

231−22 641

263−594 981

263−19 581

6 231−4 398 621

231−1 120 941

263−122 358 381

263−29 342 085

7 231−50 949

231−6489

263−92 181

263−52 425

HEIKO BAUKE AND STEPHAN MERTENS PHYSICAL REVIEW E 75, 066701 �2007�

066701-12

Page 13: Random numbers for large-scale distributed Monte Carlo simulations · 2007. 6. 21. · Random numbers for large-scale distributed Monte Carlo simulations Heiko Bauke* Institut für

�60k + 1�4 − 1 = 24 � 3 � 5k�30k + 1��1800k2 + 60k + 1�

�60k + 7�4 − 1 = 24 � 3 � 5�10k + 1��15k + 2�

��360k2 + 84k + 5��A7�

�60k + 11�4 − 1 = 24 � 3 � 5�5k + 1��6k + 1�

��1800k2 + 660k + 61�

�60k + 13�4 − 1 = 24 � 3 � 5�5k + 1��30k + 7�

��360k2 + 156k + 17�;

and so on ….Case n=6. This case is similar to the n=2 and n=4 cases.

Applying the sieve of Erastosthenes modulo 84 it can be

shown that m6−1 is a product of 23�32�7 and at least fourother factors.

Cases n=3, n=5, n=7. Here n is prime, and therefore theperiod mn−1 has at least three factors. The number of factorsof mn−1 will not exceed three, if m is a Sophie-Germainprime and �n�m� is prime, too.

In Table III we present a collection of Sophie-Germainprimes m, for which mn−1 has a minimal number of primefactors. For n prime an extended table can be found in �56�.If n is prime, its factorization can be found by Eq. �A4�, andif n=2 or n=4 by Eqs. �A6� and �A7�, respectively. All theseprimes are good candidates for moduli of LFSR sequences asPRNGs in parallel applications. Note that the knowledge ofthe factorization of mn−1 is essential for an efficient test ofthe primitivity of characteristic polynomials.

�1� B. Hayes, Am. Sci. 89, 300 �2001�.�2� P. D. Coddington, NHSE Review �1996�, http://

nhse.cs.rice.edu/NHSEreview/RNG/�3� S. L. Anderson, SIAM Rev. 32, 221 �1990�.�4� G. Marsaglia, Proc. Natl. Acad. Sci. U.S.A. 61, 25 �1968�.�5� D. E. Knuth, The Art of Computer Programming, 3rd ed. �Ad-

dison Wesley Professional, Reading, MA, 1998�, Vol. 2.�6� P. L’Ecuyer, in Handbook of Computational Statistics, edited

by J. E. Gentle, W. Härdle, and Y. Mori �Springer, New York,2004�.

�7� D. H. Lehmer, Proceedings of the 2nd Symposium on Large-Scale Digital Calculating Machinery, Cambridge, MA �Har-vard University Press, Cambridge, MA, 1951�, pp. 141–146.

�8� D. E. Knuth, The Art of Computer Programming, 1st ed. �Ad-dison Wesley Professional, Reading, MA, 1969�, Vol. 2.

�9� S. Tezuka, Uniform Random Numbers: Theory and Practice�Kluwer Academic, Boston, 1995�.

�10� P. L’Ecuyer, F. Blouin, and R. Couture, ACM Trans. Model.Comput. Simul. 3, 87 �1993�.

�11� D. Jungnickel, Finite Fields: Structure and Arithmetics �Bib-liographisches Institut, Mannheim, 1993�.

�12� J. Massey, IEEE Trans. Inf. Theory 15, 122 �1969�.�13� A. J. Menezes, Handbook of Applied Cryptography, The CRC

Press Series on Discrete Mathematics and its Applications�CRC, Boca Raton, FL, 1996�.

�14� M. Matsumoto and T. Nishimura, ACM Trans. Model. Com-put. Simul. 8, 3 �1998�.

�15� S. W. Golomb, Shift Register Sequences, revised ed. �AeganPark, Laguna Hills, CA, 1982�.

�16� R. Lidl and H. Niederreiter, Introduction to Finite Fields andtheir Applications, 2nd ed. �Cambridge University Press, Cam-bridge, England, 1994�.

�17� R. Lidl and H. Niederreiter, Encyclopedia of Mathematics andits Applications, 2nd ed. �Cambridge University Press, Cam-bridge, England, 1997�, Vol. 20.

�18� J. Fillmore and M. Marx, SIAM Rev. 10, 342 �1968�.�19� N. Zierler, J. Soc. Ind. Appl. Math. 7, 31 �1959�.�20� Z.-X. Wan, Lectures on Finite Fields and Galois Rings �World

Scientific, Singapore, 2003�.

�21� A. Compagner, Am. J. Phys. 59, 700 �1991�.�22� A. Compagner, J. Stat. Phys. 63, 883 �1991�.�23� O. E. Percus and M. H. Kalos, J. Parallel Distrib. Comput. 6,

477 �1989�.�24� M. Mascagni, Parallel Comput. 24, 923 �1998�.�25� A. D. Matteis and S. Pagnutti, Parallel Comput. 13, 193

�1990�.�26� A. D. Matteis and S. Pagnutti, Parallel Comput. 14, 207

�1990�.�27� J. Eichenauer-Herrmann and H. Grothe, Numer. Math. 56, 609

�1989�.�28� P. L’Ecuyer, Commun. ACM 33, 85 �1990�.�29� S. Kirkpatrick and E. P. Stoll, J. Comput. Phys. 40, 517

�1981�.�30� R. M. Ziff, Comput. Phys. 12, 385 �1998�.�31� P. Grassberger, Phys. Lett. A 181, 43 �1993�.�32� L. N. Shchur, J. R. Heringa, and H. W. J. Blöte, Physica A

241, 579 �1997�.�33� A. M. Ferrenberg, D. P. Landau, and Y. J. Wong, Phys. Rev.

Lett. 69, 3382 �1992�.�34� H. Bauke and S. Mertens, J. Stat. Phys. 114, 1149 �2004�.�35� S. Mertens and H. Bauke, Phys. Rev. E 69, 055702�R� �2004�.�36� P. L’Ecuyer, Oper. Res. 47, 159 �1999�.�37� A. Grube, Z. Angew. Math. Mech. 53, T223 �1973�.�38� P. L’Ecuyer, ACM Trans. Model. Comput. Simul. 3, 87

�1993�.�39� P. L’Ecuyer, Proceedings of the 2001 Winter Simulation Con-

ference �IEEE Press, Washington, DC, 2001�, pp. 95–105.�40� P. L’Ecuyer and P. Hellekalek, Random and Quasi-Random

Point Sets, Lecture Notes in Statistics Vol. 138 �Springer, NewYork, 1998�, pp. 223–266.

�41� L. Schrage, ACM Trans. Math. Softw. 5, 132 �1979�.�42� W. H. Payne, J. R. Rabung, and T. P. Bogyo, Commun. ACM

12, 85 �1969�.�43� M. Mascagni and H. Chi, Parallel Comput. 30, 1217 �2004�.�44� Tina’s Random Number Generator Library, http://tina.nat.uni-

magdeburg.de/trng/�45� GNU Scientific Library, http://www.gnu.org/software/gsl/�46� W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flan-

RANDOM NUMBERS FOR LARGE-SCALE DISTRIBUTED… PHYSICAL REVIEW E 75, 066701 �2007�

066701-13

Page 14: Random numbers for large-scale distributed Monte Carlo simulations · 2007. 6. 21. · Random numbers for large-scale distributed Monte Carlo simulations Heiko Bauke* Institut für

nery, Numerical Recipes in C, 2nd ed. �Cambridge UniversityPress, Cambridge, England, 1992�.

�47� M. Lüscher, Comput. Phys. Commun. 79, 100 �1994�.�48� W. E. Brown, M. Fischler, J. Kowalkowski, and M. Paterno,

Random Number Generation in C+ +0X: A ComprehensiveProposal, version 2 �2006�; http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2032.pdf

�49� U. Wolff, Phys. Rev. Lett. 62, 361 �1989�.�50� A. E. Ferdinand and M. E. Fisher, Phys. Rev. 185, 832 �1969�.�51� D. Wang and A. Compagner, Math. Comput. 60, 363 �1993�.

�52� M. Matsumoto and Y. Kurita, ACM Trans. Model. Comput.Simul. 6, 99 �1996�.

�53� I. Vattulainen, Phys. Rev. E 59, 7200 �1999�.�54� G. Marsaglia, in Computer Science and Statistics: Proceedings

of the 16th Symposium on the Interface, Atlanta, Georgia,1984, edited by L. Billard �North Holland, Amsterdam, 1985�.

�55� Z.-X. Wan, Lectures on Finite Fields and Galois Rings �WorldScientific, Singapore, 2003�.

�56� P. L’Ecuyer, Oper. Res. 47, 159 �1999�.

HEIKO BAUKE AND STEPHAN MERTENS PHYSICAL REVIEW E 75, 066701 �2007�

066701-14