2 preliminary · the standard f actorization of lyndon w ords: an av erage p oin t of view f r ed...

26

Upload: others

Post on 01-Nov-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 2 Preliminary · The Standard F actorization of Lyndon W ords: an Av erage P oin t of View F r ed erique Bassino a Julien Cl emen t Cyril Nicaud a Institut Gasp ar d Monge Universit

The Standard Fa torization of Lyndon Words:an Average Point of ViewFr�ed�erique Bassino a Julien Cl�ement a Cyril Ni aud aaInstitut Gaspard MongeUniversit�e de Marne-la-Vall�ee77454 Marne-la-Vall�ee Cedex 2 - Fran eAbstra tA non-empty word w of fa; bg� is a Lyndon word if and only if it is stri tly smallerfor the lexi ographi al order than any of its proper suÆx. Su h a word w is eithera letter or admits a standard fa torization uv where v is its smallest proper suÆx.For any Lyndon word v, we show that the set of Lyndon words having v as rightfa tor of the standard fa torization is rational and ompute expli itly the asso iatedgenerating fun tion. Next we establish that, for the uniform distribution over theLyndon words of length n, the average length of the right fa tor v of the standardfa torization is asymptoti ally 3n=4.Key words: Lyndon word, standard fa torization, average- ase analysis, analyti ombinatori s, su ess run1 Introdu tionGiven a totally ordered alphabet A, a Lyndon word is a word that is stri tlysmaller, for the lexi ographi al order, than any of its onjugates (i.e., all wordsobtained by a ir ular permutation on the letters). Lyndon words were intro-du ed by Lyndon [19℄ under the name of \standard lexi ographi sequen es"in order to give a base for the free Lie algebra over A; the standard fa toriza-tion plays a entral role in this framework (see [17℄, [22℄, [23℄). More pre iselyto a Lyndon word w is asso iated a binary tree T (w) re ursively built in thefollowing way: if w is a letter, then T (w) is a leaf labeled by w, otherwise T (w)Email addresses: bassino�univ-mlv.fr (Fr�ed�erique Bassino), lementj�univ-mlv.fr (Julien Cl�ement), ni aud�univ-mlv.fr (Cyril Ni aud).Preprint submitted to Elsevier S ien e 17 Mar h 2003

Page 2: 2 Preliminary · The Standard F actorization of Lyndon W ords: an Av erage P oin t of View F r ed erique Bassino a Julien Cl emen t Cyril Nicaud a Institut Gasp ar d Monge Universit

is an internal node having T (u) and T (v) as hildren where u � v is the stan-dard fa torization of w. This stru ture en odes a nonasso iative operation,either a ommutator in the free group [6℄, or a Lie bra keting [17℄; both on-stru tions leads to bases of the free Lie algebra. The average omplexity ofthe algorithms omputing these bases is basi ally determined by the averageheight of these trees.One of the basi properties of the set of Lyndon words is that every word isuniquely fa torizable as a non in reasing produ t of Lyndon words. As thereexists a bije tion between Lyndon words over an alphabet of ardinality kand irredu ible polynomials over Fk [14℄, lots of results are known about thisfa torization: the average number of fa tors, the average length of the longestfa tor [10℄ and of the shortest [21℄.Several algorithms deal with Lyndon words. Duval gives in [8℄ an algorithmthat omputes, in linear time, the fa torization of a word into Lyndon words.There exists [13℄ an algorithm generating all Lyndon word up to a given lengthin lexi ographi al order. This algorithm runs in a onstant average time (see[3℄).In Se tion 2, we de�ne more formally Lyndon words and give some enumerativeproperties of these sets of words. Then we introdu e the standard fa torizationof a Lyndon word w whi h is the unique ouple of Lyndon words u, v su hthat w = uv and v is of maximal length.In Se tion 3, we study the set of Lyndon words of fa; bg� having a given rightfa tor in their standard fa torization and prove that it is a rational language.We also ompute its asso iated generating fun tion. But as the set of Lyndonwords is not ontext-free [1℄, we are not able to dire tly derive asymptoti properties from these generating fun tions. Consequently in Se tion 4 we useprobabilisti te hniques and results from analyti ombinatori s (see [11℄) inorder to ompute the average length of the fa tors of the standard fa torizationof Lyndon words.Se tion 5 is devoted to algorithms and experimental results. We give an al-gorithm to generate randomly for uniform distribution a Lyndon word of agiven length and another one related to the standard fa torization of a Lyndonword. Finally experiments are given whi h on�rm our results and give hintsof further studies. 2

Page 3: 2 Preliminary · The Standard F actorization of Lyndon W ords: an Av erage P oin t of View F r ed erique Bassino a Julien Cl emen t Cyril Nicaud a Institut Gasp ar d Monge Universit

2 PreliminaryWe denote A� the free monoid over the alphabet A = fa; bg obtained by all�nite on atenations of elements of A. The length jwj of a word w is thenumber of the letters w is produ t of, jwja is the number of o urren es ofthe letter a in w. We onsider the lexi ographi al order < over all non-emptywords of A� de�ned by the extension of the order a < b over A.We re ord two properties of this order(i) For any word w of A�, u < v if and only if wu < wv.(ii) Let x; y 2 A� be two words su h that x < y. If x is not a pre�x of y thenfor every x0; y0 2 A� we have xx0 < yy0.By de�nition, a Lyndon word is a primitive word (i.e, it is not a power ofanother word) that is minimal, for the lexi ographi al order, in its onjugate lass (i.e, the set of all words obtained by a ir ular permutation). The set ofLyndon words of length n is denoted by Ln and L = [nLn.L = fa; b; ab; aab; abb; aaab; aabb; abbb;aaaab; aaabb; aabab; aabbb; ababb; abbbb; : : : gEquivalently, w 2 L if and only if8u; v 2 A+; w = uv ) w < vu:A non-empty word is a Lyndon word if and only if it is stri tly smaller thanany of its proper suÆxes.Proposition 1 A word w 2 A+ is a Lyndon word if and only if either w 2 Aor w = uv with u; v 2 L, u < v.Theorem 2 (Lyndon) Any word w 2 A+ an be written uniquely as a non-in reasing produ t of Lyndon words:w = `1`2 : : : `n; `i 2 L; `1 � `2 � � � � � `n:Moreover, `n is the smallest suÆx of w.The number Card(Ln) of Lyndon words of length n over A (see [17℄) isCard(Ln) = 1nXdjn �(d) Card(A)n=d;where � is the Moebius fun tion de�ned on N nf0g by �(1) = 1, �(n) = (�1)iif n is the produ t of i distin t primes and �(n) = 0 otherwise.3

Page 4: 2 Preliminary · The Standard F actorization of Lyndon W ords: an Av erage P oin t of View F r ed erique Bassino a Julien Cl emen t Cyril Nicaud a Institut Gasp ar d Monge Universit

When Card(A) = 2, we obtain the following estimateCard(Ln) = 2nn �1 +O �2�n=2�� :For w 2 LnA a Lyndon word not redu ed to a letter, the pair (u; v), u; v 2 Lsu h that w = uv and v of maximal length is alled the standard fa torization.The words u and v are alled the left fa tor and right fa tor of the standardfa torization.Equivalently, the right fa tor v of the standard fa torization of a Lyndonword w whi h is not redu ed to a letter an be de�ned as the smallest propersuÆx of w.Example 3 aaabaab = aaab � aab; aaababb = a � aababb; aabaabb = aab �aabb.3 Counting Lyndon words with a given right fa torIn this se tion, we prove that the set of Lyndon words with a given rightfa tor in their standard fa torization is a rational language and ompute itsgenerating fun tion. The te hniques used in the following basi ally ome from ombinatori s on words.Let w = vabi be a word ontaining one a and ending with a sequen e of b.The word R(w) = vb is the redu ed word of w.For any Lyndon word v, we de�ne the setXv = fv0 = v; v1 = R(v); v2 = R2(v); : : : ; vk = Rk(v)g:where k = jvja is the number of o urren es of a in v. Note that Card (Xv) =jvja + 1 and vk = b.Example 4 (1) v = aabab: Xaabab = faabab; aabb; ab; bg.(2) v = a: Xa = fa; bg.(3) v = b: Xb = fbg.By onstru tion, v is the smallest element of X+v for the lexi ographi al order.Lemma 5 Every word x 2 Xv is a Lyndon word.PROOF. If v = a, then Xv = fa; bg, else any element of Xv ends by a b. In this ase, if x =2 L, there exists a de omposition x = x1x2b su h that x2bx1 � x1x2b4

Page 5: 2 Preliminary · The Standard F actorization of Lyndon W ords: an Av erage P oin t of View F r ed erique Bassino a Julien Cl emen t Cyril Nicaud a Institut Gasp ar d Monge Universit

and x1 6= ". Thus x2a is not a left fa tor of x1x2b and x2a < x1x2a. By onstru tion of Xv, as x 6= v, there exists a word w su h that v = x1x2aw. Weget that x2awx1 < x1x2aw. This is impossible sin e v 2 L. 2A ode C over A� is a set of non-empty words su h any word w of A� anbe written in at most one way as a produ t of elements of C. A set of wordsis pre�x if none of its elements is the pre�x of another one. Su h a set is a ode, alled a pre�x ode. A ode C is said to be ir ular if any word of A�written along a ir le admits at most one de omposition as produ t of wordsof C. These odes an be hara terized as the bases of very pure monoids, i.e.,if wn 2 C� then w 2 C�. For a general referen e about odes, see [2℄.Proposition 6 The set Xv is a pre�x ir ular ode.PROOF. If x; y 2 Xv with jxj < jyj, then, by onstru tion of Xv, x > y. So xis not a left fa tor of y and Xv is a pre�x ode.Moreover, for every n � 1, if w is a word su h that wn 2 X �v then w 2 X �v .Indeed if w =2 X �v , then either w is a proper pre�x of a word of Xv or w hasa pre�x in X �v . If w is a proper pre�x of a word of Xv, it is a pre�x of v andit is stri tly smaller than any word of Xv. As wn 2 X �v , w or one of its pre�xis a suÆx of a word of Xv. But all elements of Xv are Lyndon words greaterthan v, so their suÆxes are stri tly greater than v and w an not be a pre�xof a word of Xv.Now if w = w1w2 where w1 is the longest pre�x of w in X+v , then w2 is anon-empty pre�x of a word Xv, so w2 is stri tly smaller than any word of Xv.As wn 2 X �v , w2 or one of its pre�x is a suÆx of a word of Xv, but all elementsof Xv are Lyndon words greater than v, so their suÆxes are stri tly greaterthan v and w an not have a pre�x in X+v .As a on lusion, sin e Xv is a ode and for every n � 1, if wn 2 X �v then wn 2X �v , Xv is ir ular ode. 2Proposition 7 Let ` 2 L be a Lyndon word, ` � v if and only if ` 2 X+v .PROOF. If ` � v, let `1 be the longest pre�x of ` whi h belongs to X �v ,and `2 su h that ` = `1`2. If `2 6= ", we have the inequality `2`1 > ` � v,thus `2`1 > v. The word v is not a pre�x of `2 sin e `2 has no pre�x in Xv,hen e we have `2 = `02b`002 and v = `02av00. Then, by onstru tion of Xv, `02b 2 Xvwhi h is impossible. Thus `2 = " and ` 2 X+v .Conversely, if ` 2 X+v , as a produ t of words greater than v, ` � v. 25

Page 6: 2 Preliminary · The Standard F actorization of Lyndon W ords: an Av erage P oin t of View F r ed erique Bassino a Julien Cl emen t Cyril Nicaud a Institut Gasp ar d Monge Universit

Denote for a language L � A�a�1L = fw 2 A� j aw 2 Lg:Theorem 8 Let v 2 L and w 2 A�. Then awv is a Lyndon word with aw � vas standard fa torization if and only if w 2 X �v n (a�1Xv)X �v . Hen e the set Fvof Lyndon words having v as right standard fa tor is a rational language.PROOF. Assume that awv is a Lyndon word and its standard fa torizationis aw � v. By Theorem 2, wv an be written uniquely aswv = `1`2 : : : `n; `i 2 L; `1 � `2 � � � � � `n:As v is the smallest (for the lexi ographi al order) suÆx of awv, and on-sequently of wv, we get `n = v; if w = ", then n = 1, else n � 2 andfor 1 � i � n� 1, `i � v. Thus, w 2 X �v .Moreover if w 2 (a�1Xv)X �v , then aw 2 X+v \ L. Hen e aw � v whi h is ontradi tory with the de�nition of the standard fa torization. So w 2 X �v n(a�1Xv)X �v .Conversely, if w 2 X �v n (a�1Xv)X �v , thenw = x1x2 : : : xn; xi 2 Xv and aw =2 X+v :From Proposition 1, the produ t ``0 of two Lyndon words su h that ` < `0is a Lyndon word. Repla ing as mu h as possible xixi+1 by their produ twhen xi < xi+1, w an be rewritten asw = y1y2 : : : ym; yi 2 X+v \ L; y1 � y2 � � � � � ym:As aw =2 X+v , for any integer 1 � i � m, ay1 : : : yi =2 X+v .Now we prove by indu tion that aw 2 L. As y1 2 L and a < y1, ay1 2 L.Suppose that ay1 : : : yi 2 L. Then, as yi+1 2 L \ X+v , and ay1 : : : yi 2 L n X+v ,from Proposition 7, we get ay1 : : : yi < v � yi+1. Hen e ay1 : : : yi+1 2 L.So, aw 2 L.As aw 2 L n X+v , aw < v and awv 2 L. Setting v = ym+1, we havewv = y1y2 : : : ymym+1; yi 2 X �v \ L; y1 � y2 � : : : � ym+1:Moreover any proper suÆx s of awv is a suÆx of wv and an be written as s =y0iyi+1 : : : ym+1 where y0i is a suÆx of yi. As yi 2 L, y0i � yi. As yi 2 X+v , yi � vand thus s � v. Thus, v is the smallest suÆx of awv and aw � v is the standardfa torization of the Lyndon word awv.6

Page 7: 2 Preliminary · The Standard F actorization of Lyndon W ords: an Av erage P oin t of View F r ed erique Bassino a Julien Cl emen t Cyril Nicaud a Institut Gasp ar d Monge Universit

Finally as the set of rational languages is losed by omplementation, on- atenation, Kleene star operation and left quotient, for any Lyndon word v,the set Fv of Lyndon words having v right standard fa tor is a rational lan-guage. 2We de�ne the generating fun tions Xv(z) of Xv and X�v (z) of X �v :Xv(z) = Xw2Xv zjwj and X�v (z) = Xw2X �v zjwj:As the set Xv is a ode, the elements of X �v are sequen es of elements of Xv(see [11℄): X�v (z) = 11�Xv(z) :Denote by Fv(z) = Px2Fv zjxj the generating fun tion of the setFv = fawv 2 L j aw � v is the standard fa torizationg:Theorem 9 Let v be a Lyndon word. The generating fun tion of the set Fvof Lyndon words having a right standard fa tor v an be writtenFv(z) = zjvj 1 + 2z � 11�Xv(z)! :PROOF. First of all, note that any Lyndon word of fa; bg� whi h is not aletter ends with the letter b, so Fa(z) = 0. And as Xa = fa; bg, the formulagiven for Fv(z) holds for v = a.Assume that v 6= a. From Theorem 8, Fv(z) an be written asFv(z) = zjavj Xw2X �v na�1X+v zjwj:In order to transform this ombinatorial des ription involving X �v n a�1X+vinto an enumerative formula of the generating fun tion Fv(z), we prove �rstthat a�1X+v � X �v and, next that the set a�1X+v an be des ribed as a disjointunion of rational sets.If x 2 Xv n fbg, then x is greater than v and as x is a Lyndon word, itsproper suÆxes are stri tly greater than v; onsequently, writing a�1x as a non-in reasing sequen e of Lyndon word `1; : : : ; `m, we get, sin e `m � v, that forall i, `i is greater than v. Consequently from Proposition 7, for all i, `i 2 X �v andas a produ t of elements of X+v , a�1x 2 X+v . Therefore a�1 (Xv n fbg)X �v � X �v .7

Page 8: 2 Preliminary · The Standard F actorization of Lyndon W ords: an Av erage P oin t of View F r ed erique Bassino a Julien Cl emen t Cyril Nicaud a Institut Gasp ar d Monge Universit

Moreover if x1; x2 2 Xv and x1 6= x2, as Xv is a pre�x ode,a�1x1X �v \ a�1x2X �v = ;:Thus a�1 (Xv n fbg)X �v is the disjoint union of the sets (a�1xi)X �v when xiranges over Xv n fbg. Consequently the generating fun tion of the set Fv ofLyndon words having v as right fa tor satis�esFv(z) = zjvj+1 1� Xv(z)�zz1� Xv(z)and �nally the announ ed equality. 2Note that the fun tion Fv(z) is rational for any Lyndon word v. But the rightstandard fa tor runs over the set of Lyndon words whi h is not ontext-free [1℄.Therefore in order to study the average length of the fa tors in the standardfa torization of Lyndon words, we adopt another point of view.4 Main resultMaking use of probabilisti te hniques and results from analyti ombinatori s(see [11℄), we establish the following result.Theorem 10 Under the uniform distribution over the Lyndon words of length n,the average length of the right fa tor of the standard fa torization is3n4 1 + O log3 nn !! :Remark 11 The error term omes from su essive approximations at di�er-ent steps of the proof and, for this reason, it is probably overestimated (seeexperimental results in Se tion 5).First we partition the set Ln in the two following subsets: aLn�1 and LnnaLn�1.Note that aLn�1 � Ln, that is, if w is a Lyndon word then aw is also aLyndon word. Moreover if w 2 aLn�1, the standard fa torization is w = a � vwith v 2 Ln�1. As Card (Ln�1) = 2n�1n� 1 �1 +O �2�n=2�� ;8

Page 9: 2 Preliminary · The Standard F actorization of Lyndon W ords: an Av erage P oin t of View F r ed erique Bassino a Julien Cl emen t Cyril Nicaud a Institut Gasp ar d Monge Universit

the ontribution of the set aLn�1 to the mean value of the length of the rightfa tor is (n� 1)� Card (aLn�1)Card (Ln) = n2 �1 +O �2�n=2�� :The remaining part of this paper is devoted to the standard fa torization ofthe words of Ln n aLn�1 whi h requires a areful analysis.Proposition 12 The ontribution of the set Ln n aLn�1 to the mean value ofthe length of right fa tor is n4 1 +O log3 nn !! :This proposition basi ally asserts that in average for the uniform distributionover Ln n aLn�1, the length of the right fa tor is asymptoti ally n=2.The idea is to build a transformation ', whi h is an involution on almost allthe set Ln n aLn�1, su h that the sum of the lengths of standard right fa torsof w and '(w) is about the length jwj of w. The word '(w) is obtained fromw by ex hanging parti ular suÆxes of the fa tors of the standard fa torizationof w so that standard fa tors of w and '(w) have the same pre�xes.4.1 Max-run de omposition of words of L n aLFor any Lyndon word w whi h is not redu ed to a letter, there exists a positiveinteger k = k(w) su h that akb is a pre�x of w. It is also the length of thelongest runs of a's in w. Let Lk be the set of Lyndon words with a �rst run oflength k.Denote by Xk the set Xak�1b namely Xk = faib j 0 � i � k� 1g. We partitionea h set Lk n aLk�1 in two sets L1k and L2kL1k = akbX �k�1(ak�1bX �k�1)+ \ (L n aL); L2k = akbX �k (akbX �k )+ \ (L n aL)depending on the standard fa torization. Indeed the standard fa torization ofa word w of Lk n aLk�1 an only be one of the followingw= akbu � ak�1bvw = akbu � akbv:This means that the right fa tor of a Lyndon word w an only begin with ak�1bor akb. Denote by K the length of the �rst run of a's of the right fa tor of w.Remark that K = k � 1 if w 2 L1k and K = k if w 2 L2k.9

Page 10: 2 Preliminary · The Standard F actorization of Lyndon W ords: an Av erage P oin t of View F r ed erique Bassino a Julien Cl emen t Cyril Nicaud a Institut Gasp ar d Monge Universit

With these notations, we introdu e a de omposition of words of L n aL alledmax-run de omposition throughout this paper. Any word w of L n aL an bewrittenw = f1 : : : fm with k = k(w), f1 2 akbX �K and fi 2 aKbX �K for 2 � i � m:The standard fa torization always o urs at a point of the max-run de ompo-sition: there exists j 2 f2; : : : ; mg su h that the standard fa torization of wis j�1Yi=1 fi � mYi=j fi:We will study this de omposition by means of analyti al tools and presentnow de�nitions and results whi h play a entral role hereafter. Let Xk(z) andX�k(z) be the generating fun tions respe tively asso iated to Xk and X �k (z)namely Xk(z) = kXi=1 zi and X�k(z) = 11�Xk(z) :The smallest pole of X�k(z) that is, from the Rou h�e theorem (see [5℄), theonly one in the unit dis is�k = 12 + �k; with �k = 12k+2 + k + 122k+3 +O k223k! :The value of �k is obtained by the bootstrapping method as in [16℄ using thefa t that �k is a root of 1� 2z + zk+1.Denoting by [zn℄F (z) the oeÆ ient of zn in F (z) and using the standardextra tion formula for rational series with a simple pole (see [11℄), we anwrite [zn℄ P (z)1�Xk(z) = P (�k)X 0k(�k) ��(n+1)k +O(1) (1)provided that �k is not a root of the polynomial P (z). Therefore we also needthe following estimate of the derivative X 0k of Xk at z = �k(X 0k(�k))�1 = 14 1� 4�2k1� 2k�k! = 14 1 + k2k+1!+O k222k! : (2)In the following subsets of Lyndon words will be enumerated by means of theelegant onstru tion of primitive y les [12℄.Proposition 13 (Primitive y les) Let C be a ode, with generating fun -tion C(z) = Pw2C zjwj. Then the generating fun tion of the primitive y les ofelements of C is Xm�1 �(m)m log 11� C(zm)! :10

Page 11: 2 Preliminary · The Standard F actorization of Lyndon W ords: an Av erage P oin t of View F r ed erique Bassino a Julien Cl emen t Cyril Nicaud a Institut Gasp ar d Monge Universit

This equation an be used dire tly to obtain several interesting generatingfun tions of sets of words(i) the set of Lyndon words taking C = fa; bg, C(z) = 2z.(ii) the set of Lyndon words beginning with stri tly less than k a's taking C =Xk, C(z) = Xk(z).(iii) the set of Lyndon words beginning with exa tly k a's taking C = akb(Xk)�,C(z) = zk+11�Xk(z) .Length k of longest runs.First we study the pre ise distribution of the length of the longest runs of a'sin a Lyndon word w. This question is strongly related to the notion of su essrun in probability theory [9℄Proposition 14 The probability pn;k that aib, with 1 � i < k, is a pre�x of aLyndon word of length n ispn;k = (1 + 2�k)�n +O �2�n=2� with �k = 12k+2 + k + 122k+3 +O k223k!:PROOF. Denote L<k the set of Lyndon words beginning with stri tly lessthan k a's L<k = fw 2 L jw � ak�1bg:The number of words of length n in L<k is the number of primitive y les ofelements in Xk of total length n. From Proposition 13, we getL<k(z) = Xm�1 �(m)m log 11�Xk(zm)!;where � is the Moebius fun tion. We set L<k(z) = Pn�1 `n;kzn. Then, di�er-entiating with respe t to z, we obtainXn�1n`n;kzn�1 = Xm�1 �(m) X 0k(zm)1�Xk(zm)zm�1:Hen e we have n`n;k = Xmjn�� nm� [zm℄ X 0k(z)1�Xk(z)z:Introdu ing �k and using Equation (1), we get`n;k = 1nXmjn�� nm� ���mk +O(1)� :11

Page 12: 2 Preliminary · The Standard F actorization of Lyndon W ords: an Av erage P oin t of View F r ed erique Bassino a Julien Cl emen t Cyril Nicaud a Institut Gasp ar d Monge Universit

Moreover as the number of divisors (see [15℄) of n is O(nÆ) for any positive Æ,we an write for any positive Æ < 1`n;k = 1nXmjn�� nm� ��mk +O �nÆ�1� :Finally repla ing �k by 1=2 + �k, we obtain`n;k = 2nn (1 + 2�k)�n +O 2n=2n ! :Making use of the following equalitiespn;k = `n;kCard(Ln) and Card(Ln) = 2nn �1 +O �2�n=2�� ;we get the announ ed result. 2Next result gives an interval to whi h belongs almost surely the length of thelongest runs of a's in a Lyndon word. In this way we restri t our ombinatorialmodel over Lyndon words, leaving apart only a negligible portion of them.Lemma 15 The length k of the longest runs of a's in a word w 2 Ln satis�esPrfk(w) 2 [log2 n� log2 log2 n� 1; 2 log2 n[g = 1� O � 1n� :PROOF. From Proposition 14, one has for the length k(w) of the longestrun of a's in a word w of LnPrfk(w) < kg = (1 + 2�k)�n +O �2�n=2� : (3)The inequality log(1 + x) > x log 2 is true for 0 < x < 1 gives after sim-ple algebra the result that is the value of k for whi h Prfk(w) < kg � 1n ,namely k = log2 n� log2 log2 n� 1.Again, in Equation (3), the inequality log(1+x) < x (true for all x) and the es-timation 2�k = 2�(k+1) �1 +O �k2�k�� give the values of k for whi h Prfk(w) <kg � 1� 1n , namely k = 2 log2 n. 2Remark 16 As Card(Ln n aLn�1) = (Card(Ln)), Lemma 15 an be statedfor Ln n aLn�1 instead of Ln.In what follows In denotes the interval [log2 n� log2 log2 n� 1; 2 log2 n[.12

Page 13: 2 Preliminary · The Standard F actorization of Lyndon W ords: an Av erage P oin t of View F r ed erique Bassino a Julien Cl emen t Cyril Nicaud a Institut Gasp ar d Monge Universit

Number of fa tors of the max-run de omposition.Now we establish a bound on the number of fa tors in the max-run de ompo-sition.Lemma 17 Let w be a Lyndon word of Ln n aLn�1 with k(w) 2 In. Thenumber m of fa tors in the max-run de omposition satis�esPrfm � 2 log2 ng = O lognn ! :PROOF. Denote L1k;�m the set of words of L1k with more than m runs of a'sof length k � 1 and L2k;�m the set of words of L2k with more than m runs ofa's of length k. We want to estimate the ratioPk2In Card((L1k;�m0 [ L2k;�m0) \ An)Pk2In Card((L1k [ L2k) \ An)for m0 = 2 log2 n.First of all (L1k [L2k)\An is the set of Lyndon words of Ln n aLn�1 beginningwith a longest run of a's of length k. From Card(Ln naLn�1) = 2n�1n (1+O( 1n))and Lemma 15, we getXk2In Card((L1k [ L2k) \ An) = 2n�1n �1 +O � 1n�� : (4)In order to estimate the remaining part of the ratio, we introdu e the setWk;mof words beginning by a longest run of a's of length k and ontaining at leastm longest runs of a's Wk;m = akb(X �k akb)m�1X �k+1:Then, denoting Wk;m(z) the generating fun tion of Wk;m,Card((L1k+1;�m [ L2k;�m) \ An) � [zn℄Wk;m(z):Indeed (L2k;�m\An) � (Wk;m\Ln) and a�1(L1k+1;�m\An) � (Wk;m\An�1)nLn�1. MoreoverCard((Wk;m \ An�1) n Ln�1) � Card((Wk;m \ An) n Ln);sin e by adding a b just after the �rst o urren e of akb we de�ne an inje tionfrom the �rst set on the se ond one. Thus, setting In = [k1; k2[, we obtainXk2InCard((L1k;�m [ L2k;�m) \ An) � k2Xk=k1�1[zn℄Wk;m(z):13

Page 14: 2 Preliminary · The Standard F actorization of Lyndon W ords: an Av erage P oin t of View F r ed erique Bassino a Julien Cl emen t Cyril Nicaud a Institut Gasp ar d Monge Universit

Moreover onsidering the ambiguous language (akbX �k+1)m, we get the follow-ing bound [zn℄Wk;m(z) � [zn℄ zk+11�Xk+1(z)!m : (5)Sin e we shall onsider m = 2 log2 n, here we an not use dire tly a formulalike in (1) to extra t oeÆ ients for this rational fun tion. So using the saddlepoint method, we establish a bound for its oeÆ ients.Lemma 18 Let F (z) be analyti fun tion su h that F (1) = 1 and F 0(1) 6= 0,and G(z) = (1 � F (z))�m. When m = O(logn) there exists < 1 su h thatfor n large enough [zn℄ 1(1� F (z))m � (1 + ) enmF 0(1)!m :PROOF. Using saddle point bound [7,20℄ on fun tion 1(1�F (z))m yields that[zn℄ 1(1� F (z))m � 1(1� F (�(n)))m (�(n))�n (6)where � is the unique positive solution in ℄0; 1[ of the equation�G0(�)G(�) = n:The last equation is equivalent to� F 0(�)1� F (�) = nm:Thus repla ing in (6) gives[zn℄ 1(1� F (z))m � �nmF 0(�)!m 1�n :Setting � = 1� x and studying Taylor oeÆ ients of F (1� x), we obtainx = mn (1 + o(1)):Using the standard estimate (1� x)n � e�nx, one an write for all > 0 andn large enough [zn℄ 1(1� F (z))m � (1 + ) nemF 0(1)!m ; on luding the proof of the lemma. 214

Page 15: 2 Preliminary · The Standard F actorization of Lyndon W ords: an Av erage P oin t of View F r ed erique Bassino a Julien Cl emen t Cyril Nicaud a Institut Gasp ar d Monge Universit

Sin e �k+1 is the smallest root of Xk+1(z)� 1 and[zn℄ zk+11�Xk+1(z)!m = 1�n�m(k+1)k+1 [zn�m(k+1)℄ 1(1�Xk+1(�k+1z))mapplying Lemma 18 and using inequality (5) one has for > 0 and m =O(logn) [zn℄Wk;m(z) � (1 + ) 1�nk+1 ne�k+1k+1mX 0k+1(�k+1)!m :Denoting by bk the last quantity, we get Pk2k=k1�1[zn℄Wk;m(z) � Pk2k=k1�1 bk.Sin e �k+1k+1 = 2�(k+1)(1 +O(k2�k)) and by Equation 2, we have ne�k+1k+1mX 0k+1(�k+1)!m = � ne2k+3m�m 1 +O mk2k !! :Moreover for k 2 In,��nk = 2n(1 + 2�(k+1) +O(k2�2k))n = 2ne�n=2k+1 1 +O nk22k!! : (7)This entails for k = O(logn) and m = O(logn),bk = (1 + ) 2ne�n=2k+2 � ne2k+3m�m 1 +O log3 nn !! :When n and m are �xed, bk is maximal for k = log2(n=m)� 2 and is equal toO(2n�m). So for m0 = 2 log2 n,Xk2In Card((L1k;�m0 [ L2k;�m0) \ An) � k2Xk=k1�1 bk = O �2nn2 logn� :Finally using (4), we obtainPk2In Card �(L1k;�m0 [ L2k;�m0) \ An�Pk2In Card((L1k [ L2k) \ An) = O lognn ! ; on luding the proof. 2Nature of the fa tors of the max-run de omposition.Our goal in the following is to distinguish for the lexi ographi al order thefa tors of the max-run de omposition. Re all that any word of w 2 LnaL anbe written w = f1 : : : fm where f1 = ak(w)bw1, fi = aKbwi for i > 1, wi 2 X �K15

Page 16: 2 Preliminary · The Standard F actorization of Lyndon W ords: an Av erage P oin t of View F r ed erique Bassino a Julien Cl emen t Cyril Nicaud a Institut Gasp ar d Monge Universit

for all i and K = k(w) or k(w)� 1. The wi are alled the interleaving words.We �rst prove that all interleaving words are of length at least K.We introdu e the set PK of words w 2 X �K su h that denoting by w[i℄ the i-thletter of wK � jwj � 2K � 1 and 8i 2 fK; : : : ; jwj � 1g; w[i℄ = a:For example for K = 3, we have X3 = fb; ab; aabg and the set P3 isP3 = fbaab; abaab; bbaab; bab; abab; bbab; bbb; abb; aabg:The following formula stresses the role of the last word of XK in the fa tor-ization of words of PKPK = �[K�2j=0 Ajb aK�1b� [ �[K�2j=1 Ajb aK�2b� [ � � � [ �[K�2j=K�2Ajb ab� [ AK�1b:Usual translation to generating fun tions entailsPK(z) = KXj=2 zj+1 0� K�2Xi=K�j(2z)i1A+ z(2z)K�1= z(2z)K�1 zK+1 + 4 z2 (12)K+1 � 4 z (12)K+1(2 z � 1) (z � 1) � zz � 1 + 1! :The losed form of this formula is not as important as the fa t thatfor x = O � 12K� ; PK �12 + x� = 1 +O(Kx) and P 0k �12 + x� = O(K):(8)Lemma 19 Let w be a Lyndon word of Ln n aLn�1 with k(w) 2 In. In itsmax-run de omposition, all interleaving words are of length at least K withprobability 1� O log2 nn ! :PROOF. We distinguish two ases depending on the values of K, namely kand k� 1. More pre isely we prove that all longest runs akb in Lyndon wordsof length n are followed by words of Pk with high probability. If the longestrun akb is unique then all runs of ak�1b are also followed by words of Pk�1with high probability.We onsider the ode C = akbPk X �k and the set C$k of primitive y les overthe ode C, i.e. the set of Lyndon words beginning with k a's and su h that allo uren es of akb are followed by words of Pk. Applying Proposition 13 with16

Page 17: 2 Preliminary · The Standard F actorization of Lyndon W ords: an Av erage P oin t of View F r ed erique Bassino a Julien Cl emen t Cyril Nicaud a Institut Gasp ar d Monge Universit

C(z) = zk+1Pk(z)1�Xk(z) yields the generating fun tion of C$kC$k (z) = Xm�1 �(m)m log 1�Xk(zm)1�Xk(zm)� zm(k+1)Pk(zm)! :Moreover for words with a unique longest run of length k, all possible o - uren es of ak�1b in a�1w must be separated by words of Pk�1. So instead ofC$k we are bound to study the setD$k = �C$k n akbPkX �k � [ akbPk�1X �k�1 �ak�1bPk�1Xk�1�� :Its generating fun tion an be writtenD$k (z) = C$k (z)��k(z) with �k(z) = zk+1Pk(z)1�Xk(z)� zk+1Pk�1(z)1�Xk�1(z)� zkPk�1(z) :We shall ompare the ardinality of the set L$n = [k2In(D$k \ An) namelyCard(L$n ) = Xk2In[zn℄D$k (z)with the number of Lyndon words of length n. Let %k be the smallest root of1�Xk�1(z)� zkPk�1(z). We an prove as in Se tion 4.1 that %k is simple andbelongs to [1=2; 1[. Using the bootstrapping method and the estimates (8) ofPk and P 0k, we obtain%k = 12 + 12k+2 +O k22k! = �k +O k22k! : (9)Let $n;k = [zn℄C$k (z). By usual oeÆ ient extra tion we have $n;k = 1n 1%nk+1 � 1�nk !+O 2n=2n ! :From Equations (7) and (9) we getXk2In $n;k = 2nn �e�n=2k2+2 � e�n=2k1+1�+ 2nn Xk2In e�n=2k+2O nk22k! :By de�nition of In, e�n=2k2+2�e�n=2k1+1 = 1+O(1=n). Moreover as � n2k�2 exp(� n2k+2 )is uniformly bounded for k > 0, we obtainXk2In $n;k = 2nn 1 +O log2 nn !! : (10)17

Page 18: 2 Preliminary · The Standard F actorization of Lyndon W ords: an Av erage P oin t of View F r ed erique Bassino a Julien Cl emen t Cyril Nicaud a Institut Gasp ar d Monge Universit

On the other hand, using again oeÆ ient extra tion of rational fun tions andusing Equations (2), (7), (9) and we have[zn℄ zk+1Pk�1(z)1�Xk�1(z)� zkPk�1(z) = %k+1k Pk�1(%k) %�(n+1)kX 0k�1(%k) + (k + 1)%kkPk(%k) + %k+1k P 0k�1(%k) +O(1)[zn℄zk+1Pk(z)1�Xk(z) = �k+1k Pk(�k)X 0k(�k) 1�n+1k +O(1):As X 0k(z) = X 0k�1(z) + (k + 1)zk, we get by Equations (7) and (9)[zn℄�k(z) = 2n 12k+1e�n=2k+2O nk22k! :Again as � n2k �3 exp(� n2k+2 ) is uniformly bounded for k > 0, we obtainXk2In[zn℄�k(z) = O 2n log2 nn2 ! :Consequently using Equation (10), we get Card(L$n ) = 2nn �1 +O � log2 nn �� andCard(Ln)Card(L$n ) = 1 +O log2 nn ! :Thus almost all Lyndon words of length n belong to L$n . Finally sin e Card(LnnaLn�1) = (Card(Ln)) the property on the length of the interleaving wordsalso holds on Ln n aLn�1 with an error term of same order. 2To ompare for the lexi ographi al order two fa tors beginning with a longestrun of a's of length K, it remains to distinguish at most m = 2 log2 n inter-leaving words of PKX �K.Lemma 20 Let w be a Lyndon word of Ln n aLn�1 with k(w) 2 In having amax-run de omposition into m = O(logn) fa tors. The m interleaving wordshave pairwise distin t pre�xes in PK with probability greater than1� O log3 nn ! :PROOF. From previous lemma, interleaving words are longer than K withprobability 1�O(log2 n=n). Thus we fo us on the subsets Q$n;K of Ln n aLn�1of Lyndon words with a max-run de omposition where all interleaving wordsare in PKX �K . We shall prove that all the pre�xes in PK of these words are18

Page 19: 2 Preliminary · The Standard F actorization of Lyndon W ords: an Av erage P oin t of View F r ed erique Bassino a Julien Cl emen t Cyril Nicaud a Institut Gasp ar d Monge Universit

pairwise distin t with high probability and that the restri tion on the lengthof the interleaving words does not a�e t the order of the error term.Given a sequen e of positive integers m = (m1; : : : ; m`) and an in reasingsequen e of positive integers ! = (!1; : : : ; !`), de�ne the set Q$n;K;m;! as theset of Lyndon words w 2 Q$n;K su h that(i) w admits a de omposition into m = Pi=1mi fa tors;(ii) for i 2 f1; : : : ; `g, w has exa tlymi interleaving words with pre�xes of length!i in PK.This de�nes a partition ofQ$n;K a ording tom and !. Denote by Q 6=n;K;m;! thesubset of words of Q$n;K;m;! with interleaving words having pairwise distin tpre�xes in PK .Let S be a set of m distin t words of PK of total length N = P!imi. Thereare (m � 1)(m � 1)! possible way of ordering S so that the �rst word is notthe smallest, yielding a word of L1k, and (m� 1)! possible ways of ordering Sso that the �rst word is the smallest, yielding a word of L2k. So ompleting thewords up to length n with aKb (possibly aK+1b at the beginning) before ea hword of PK and words of X �K after ea h word of PK , we obtainhzn�m(K+1)�N i (m� 1)! 1 + (m� 1)z(1�XK(z))mLyndon words for a given set S. If the words of S are not distin t, then thelast quantity is just an upper bound for the number of Lyndon words one anobtain.Let us �x a sequen e of positive integers m = (m1; : : : ; m`) and an in reas-ing sequen e of positive integers ! = (!1; : : : ; !`) and denote by PK;n =Card(PK \ An). As8(a1; a2; : : : ; ap) 2 [0; 1℄p; pYi=1(1� ai) � 1� pXi=1 ai;we have the following hain of inequalities provided the sets Q$n;K;m;! andQ6=n;K;m;! are not emptyCard(Q 6=n;K;m;!)Card(Q$n;K;m;!) � Qi=1 �PK;!imi �Qi=1 PmiK;!i � Yi=1 1� m2iPK;!i! :Finally sin e Pm2i � (Pmi)2 and PK;!i � 2K�1 for all i, we haveCard(Q 6=n;K;m;!)Card(Q$n;K;m;!) � 1� m22K�1 :19

Page 20: 2 Preliminary · The Standard F actorization of Lyndon W ords: an Av erage P oin t of View F r ed erique Bassino a Julien Cl emen t Cyril Nicaud a Institut Gasp ar d Monge Universit

So for m = O(logn) and K > log2 n� log2 log2 n� 2 the ratio be omesCard(Q 6=n;K;m;!)Card(Q$n;K;m;!) = 1� O log3 nn ! :Sin e Q$n;K is the disjoint union of Q$n;K;m;! for all (m;!), the result is alsotrue for Q$n;K . Finally as the error term O(log3 n=n) is uniform for all subsetsQ$n;K and the error term O(log2 n=n) oming from the hypothesis on the lengthof the interleaving words is of smaller order, the property holds for words wof Ln n aLn�1 with k(w) 2 In and m = O(logn). 24.2 An involution over Ln n aLn�1We now introdu e an involution on almost all the set Ln naLn�1 su h that thesum of the lengths of the right fa tors of w and its image is approximativelyjwj.To a hieve this goal we partition the set Ln n aLn�1 in two subsets,Ln n aLn�1 = Gn [ Bn:The set Gn is the set of words in Ln n Ln�1 whose max-run de ompositionakbw1 : : : aKbwm veri�es(i) k 2 In;(ii) m < 2 log2 n;(iii) the interleaving words wi have pairwise distin t pre�xes in PK .For any word w = akbu � aKbv 2 Gn we de�ne '(w) as'(w) = akbu0v00aKbv0u00;with u = u0u00, v = v0v00 and u0 and v0 in PK .The key fa t is that, globally, ' preserves the runs of a's and the pre�xes inPK of the interleaving words of the max-run de omposition.If ` is a Lyndon word, we denote by right(`) the right fa tor of `.Lemma 21 Under the uniform distribution over Gn the average length of theright fa tor of the standard fa torization isn2 1 +O lognn !! :20

Page 21: 2 Preliminary · The Standard F actorization of Lyndon W ords: an Av erage P oin t of View F r ed erique Bassino a Julien Cl emen t Cyril Nicaud a Institut Gasp ar d Monge Universit

PROOF. We prove that ' is an involution on Gn and the sum of the lengthsof the right fa tors of a word w 2 Gn and '(w) is about jwj.Let w 2 Gn with standard fa torizationw = akbw1 : : : aKbwd�1 � aKbwd : : : aKbwmwith wi 2 PKX �K for 1 � i � m, then'(w) = akbw01w00daKbwd+1 : : : aKbwmaKw0dw001aKbw2 : : : aKbwd�1with w1 = w01w001 , wd = w0dw00d and w01; w0d in PK .By de�nition of ', '(w) 2 akbPKX �K(aKbPKX �K)+. Moreover for a word w ofGn, the position of the smallest proper suÆx of '(w) an be easily determined.Indeed ' preserves the relative order between akbw01 < aKbw0d < aKbwi for i 6=1; d. Thus '(w) is a Lyndon word and the standard fa torization of '(w) is'(w) = akbw01w00daKbwd+1 : : : aKbwm � aKw0dw001aKbw2 : : : aKbwd�1:So '(w) 2 Gn and ' is an involution on Gn: '('(w)) = w for w 2 Gn.Moreover for any word w of Gnjright(w)j+ jright('(w))j = jwj � (k �K) + jw0dj � jw01j;where k �K 2 f0; 1g.By de�nition, the lengths of pre�xes w0d and w01 are in [K; 2K�1℄, so jjw0dj � jw01jj <K. As k 2 In and k�K 2 f0; 1g we get that jjw0dj � jw01jj = O(logn). Finallyas ' is an involution on Gn we obtain2 Xw2Gn jright(w)j = Xw2Gn (jright(w)j+ jright('(w))j)= (n +O(logn)) Card(Gn); on luding the proof. 2Now we ompute the total ontribution of Ln n aLn�1 to the mean value ofthe standard right fa torXw2LnnaLn�1 jright(w)j = Xw2Gn jright(w)j+ Xw2Bn jright(w)j:Using Lemma 21 and the fa t that jright(w)j � jwj for any Lyndon word w,21

Page 22: 2 Preliminary · The Standard F actorization of Lyndon W ords: an Av erage P oin t of View F r ed erique Bassino a Julien Cl emen t Cyril Nicaud a Institut Gasp ar d Monge Universit

we getXw2LnnaLn�1 jright(w)j = n2 1 +O lognn !!Card(Gn) +O(n)� Card(Bn):Moreover Lemmas 15, 17 and 20 mat h exa tly the onditions (i), (ii) and (iii)whi h hara terize the set Gn. It leads to the estimateCard(Gn) = Card(Ln n aLn�1) 1�O log3 nn !! :Consequently we getXw2LnnaLn�1 jright(w)j = n2 Card(Ln n aLn�1) 1 +O log3 nn !! :Finally as Card(Ln n aLn�1) = Card(Ln)�12 +O �1n�� ;the total ontribution of of Ln naLn�1 to the mean value of the standard rightfa tor is n4 1 +O log3 nn !! ; on luding the proof of Proposition 12 and Theorem 10.5 Algorithms and experimental resultsIn this se tion we give two linear algorithms to generate random Lyndon wordsof a given length n and to ompute the standard fa torization of a Lyndonword.There exists an algorithm SmallestConjugate(u), proposed by Booth [18,4℄,that gives the smallest onjugate a random Lyndon word of length n in lin-ear time. We use it to make a reje t algorithm whi h is eÆ ient to generaterandomly a Lyndon word of length n:RandomLyndonWord(n) // return a random Lyndon wordstring u, v;dou = RandomWord(n); // u is a random word of Anv = SmallestConjugate(u); // v is the smallest onjugate of uuntil (length(v) == n); // v is primitivereturn v; 22

Page 23: 2 Preliminary · The Standard F actorization of Lyndon W ords: an Av erage P oin t of View F r ed erique Bassino a Julien Cl emen t Cyril Nicaud a Institut Gasp ar d Monge Universit

The algorithm RandomLyndonWord omputes uniformly a Lyndon word.Lemma 22 The average omplexity of RandomLyndonWord(n) is linear.PROOF. Ea h exe ution of the do ...until loop is done in linear time. The ondition is not satis�ed when u is a onjugate of a periodi word vp withp > 1. This happens with probability O( n2n=2 ). Thus the loop is exe uted abounded number of times in the average. 2Lemma 23 Let ` be a Lyndon word whi h is not a letter. Let `1 : : : `k be thefa torization of a�1` into a nonin reasing sequen e of Lyndon words. The rightfa tor of ` in its standard fa torization is `k.PROOF. By Theorem 2, `k is the smallest suÆx of a�1`, thus it is the small-est proper suÆx of `. 2Let the Duval(string u, int k, array pos) be the fun tion whi h omputes thefa torization of u into a nonin reasing sequen e of Lyndon words by Duval'salgorithm [8℄. It stores in an array pos of size k the positions of the fa tors.The following algorithm omputes the right fa tor of a Lyndon word ` whi his not a letterRightFa tor(string l[1::n℄)array pos;int k;Duval(l[2::n℄, k, pos); // omit the �rst letter and apply Duval's algorithmreturn l[pos[k℄::n℄; // return the last fa torThis algorithm is linear in time sin e Duval's algorithm is linear.Figures 1, 3 and 2 present some experimental results obtained with our algo-rithms.Open problemsThe results obtained in this paper are only a �rst step toward the average ase-analysis of the tree obtained from a Lyndon word by su essive standardfa torizations. In order to study the height of these trees, a better insight ofthe distribution of the right fa tors of words of Ln n aLn�1 is needed.23

Page 24: 2 Preliminary · The Standard F actorization of Lyndon W ords: an Av erage P oin t of View F r ed erique Bassino a Julien Cl emen t Cyril Nicaud a Institut Gasp ar d Monge Universit

0

2000

4000

6000

8000

10000

12000

1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

leng

th o

f the

rig

ht fa

ctor

length of the Lyndon word

3*x/4

Fig. 1. Average length of the right fa tor of random Lyndon words of lengthfrom 1; 000 to 10; 000. Ea h plot is omputed with 1; 000 words. The error barsrepresent the standard deviation.

0

500

1000

1500

2000

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

num

ber

of h

its

size of the right factorFig. 2. Distribution of the length of the right fa tor over Ln n aLn�1. We gener-ated 100; 000 random Lyndon words of length 5; 000.Figure 3 hints a very strong equi-repartition property of the length of the rightfa tor over this set. This suggests a parti ular subdivision pro ess at ea h nodeof the fa torization tree whi h needs further investigations.Referen es[1℄ J. Berstel, L. Boasson, The set of lyndon words is not ontext-free, Bull. Eur.Asso . Theor. Comput. S i. EATCS 63 (1997) 139{140.[2℄ J. Berstel, D. Perrin, Theory of odes, A ademi Press, 1985.24

Page 25: 2 Preliminary · The Standard F actorization of Lyndon W ords: an Av erage P oin t of View F r ed erique Bassino a Julien Cl emen t Cyril Nicaud a Institut Gasp ar d Monge Universit

0

2000

4000

6000

8000

10000

12000

4950 4955 4960 4965 4970 4975 4980 4985 4990 4995 5000

Num

ber

of h

its

length of the right factorFig. 3. Zoom on the distribution of the length of the right fa tor over Ln n aLn�1.[3℄ J. Berstel, M. Po hiola, Average ost of Duval's algorithm for generatingLyndon words, Theoret. Comput. S i. 132 (1-2) (1994) 415{425.[4℄ K. S. Booth, Lexi ographi ally least ir ular substrings, Inform. Pro ess. Lett.10(4-5) (1980) 240{242.[5℄ H. Cartan, Th�eorie �el�ementaire des fon tions analytiques d'une ou plusieursvariables omplexes, Hermann, 1985.[6℄ K. Chen, R. Fox, R. Lyndon, Free di�erential al ulus IV: The quotient groupsof the lower entral series, Ann. Math. 58 (1958) 81{95.[7℄ N. G. de Bruijn, Asymptoti Method in Analysis, North Holland, 1961.[8℄ J.-P. Duval, Fa torizing words over an ordered alphabet, Journal of Algorithms4 (1983) 363{381.[9℄ W. Feller, An introdu tion to Probability Theory and Its Appli ations, 3rdEdition, Vol. 1, Wiley, 1968.[10℄ P. Flajolet, X. Gourdon, D. Panario, The omplete analysis of a polynomialfa torization algorithm over �nite �elds, Journal of Algorithms 40 (2001) 37{81.[11℄ P. Flajolet, R. Sedgewi k, Analyti ombinatori s{symboli ombinatori s,Book in preparation, (Individual hapters are available as INRIA Resear hreports at http://www.algo.inria.fr/ ajolet/publist.html) (2002).[12℄ P. Flajolet, M. Soria, The y le onstru tion, SIAM J. Dis . Math. 4 (1991)58{60.[13℄ H. Fredri ksen, J. Maiorana, Ne kla es of beads in k olors and k-ary de Bruijnsequen es, Dis rete Math. 23 (3) (1978) 207{210.25

Page 26: 2 Preliminary · The Standard F actorization of Lyndon W ords: an Av erage P oin t of View F r ed erique Bassino a Julien Cl emen t Cyril Nicaud a Institut Gasp ar d Monge Universit

[14℄ S. Golomb, Irredu ible polynomials, syn hronizing odes, primitive ne kla esand y lotomi algebra, in: Pro . Conf Combinatorial Math. and Its Appl.,Univ. of North Carolina Press, Chapel Hill, 1969, pp. 358{370.[15℄ G. Hardy, E. Wright, An Introdu tion to the Number Theory, Oxford UniversityPress, 1938.[16℄ D. Knuth, The average time for arry propagation, Indagationes Mathemati ae40 (1978) 238{242.[17℄ M. Lothaire, Combinatori s on Words, Vol. 17 of En y lopedia of mathemati sand its appli ations, Addison-Wesley, 1983.[18℄ M. Lothaire, Applied Combinatori s on Words, (in preparation), available athttp://www-igm.univ-mlv.fr/~berstel/Lothaire.[19℄ R. Lyndon, On Burnside problem I, Trans. Ameri an Math. So . 77 (1954)202{215.[20℄ A. M. Odlyzko, Handbook of Combinatori s, Elsevier, 1995, Ch. Asymptoti enumeration methods.[21℄ D. Panario, B. Ri hmond, Smallest omponents in de omposable stru tures:exp-log lass, Algorithmi a 29 (2001) 205{226.[22℄ C. Reutenauer, Free Lie algebras, Oxford University Press, 1993.[23℄ F. Ruskey, J. Sawada, Generating Lyndon bra kets: a basis for the n-thhomogeneous omponent of the free Lie algebra, Journal of Algorithms 46 (2003)21{26.

26