building cartesian trees from free trees with k leaves

5
Information Processing Letters 113 (2013) 345–349 Contents lists available at SciVerse ScienceDirect Information Processing Letters www.elsevier.com/locate/ipl Building Cartesian trees from free trees with k leaves Brian C. Dean , Raghuveer Mohan Clemson University, School of Computing, Division of Computer Science, Box 340974, Clemson, SC 29630, United States article info abstract Article history: Received 28 August 2010 Received in revised form 25 February 2013 Accepted 28 February 2013 Available online 5 March 2013 Communicated by B. Doerr Keywords: Cartesian tree Adaptive algorithms Range queries Data structures One can build a Cartesian tree from an n-element sequence in O (n) time, and from an n-node free tree in O (n log n) time (with a matching worst-case lower bound in the comparison model of computation). We connect these results together by describing a Cartesian tree construction algorithm based on a “bitonicity transform” running in O (n log k) time on a free tree with k leaves, noting that a path is the special case of a tree with just 2 leaves. We also provide a matching worst-case lower bound in the comparison model. © 2013 Elsevier B.V. All rights reserved. 1. Introduction One can define a Cartesian tree from either an n- element sequence or an edge-weighted n-node free tree. As shown in Fig. 1(a), we define the Cartesian tree aris- ing from a sequence A 1 ... A n by placing its minimum element A i at the root 1 ; its left and right subtrees are recursively defined to be Cartesian trees of the subse- quences A 1 ... A i1 and A i+1 ... A n . The Cartesian tree in this case is a heap-ordered binary tree whose in-order traversal yields the original sequence A 1 ... A n . We define the Cartesian tree of a free tree T similarly, as shown in Fig. 1(b). The root node of the Cartesian tree corresponds to the edge e of minimum weight in T , and its two chil- dren are Cartesian trees of the subtrees into which T splits upon removal of e. Internal nodes in the Cartesian tree cor- respond to edges in T , while leaves in the Cartesian tree correspond to nodes in T . Cartesian trees have a variety of algorithmic applications, mostly due to their use in relating * Corresponding author. E-mail address: [email protected] (B.C. Dean). 1 For simplicity, let us assume throughout this paper that all numbers in our input are distinct. It is easy to extend the concepts and results in our discussion to the general case with duplicates present. range minimum queries (RMQs) with lowest common ances- tor (LCA) queries (see, e.g., [4,11]). In both a sequence and a free tree, the answer to an RMQ along a subsequence or subpath corresponds to the answer of an LCA query in the Cartesian tree, as indicated in Fig. 1. In this work, we address the problem of building a Cartesian tree. One can easily build a Cartesian tree in O (n) time from an n-element sequence and in O (n log n) time from an n-node tree. Moreover, there is a match- ing Ω(n log n) worst-case lower bound on the worst-case construction time of a Cartesian tree from a free tree in the comparison model, since the Cartesian tree of a star- shaped tree is a depth-n sorted path (see Fig. 2(a)), so the process of Cartesian tree construction can be used to sort. We connect the dots between these two cases, giv- ing an O (n log k) algorithm for Cartesian tree construction from an n-node free tree with k leaves (and we provide a matching lower bound in the comparison model). Such an algorithm could be termed an “adaptive” algorithm with respect to k, in the same manner as adaptive sorting al- gorithms (see, e.g., [7]) gracefully scale in running time between O (n) and O (n log n) depending on some auxiliary parameter beyond just the problem size n that character- izes the intrinsic hardness of an instance (e.g., number of inversions). 0020-0190/$ – see front matter © 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.ipl.2013.02.014

Upload: raghuveer

Post on 30-Dec-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Building Cartesian trees from free trees with k leaves

Information Processing Letters 113 (2013) 345–349

Contents lists available at SciVerse ScienceDirect

Information Processing Letters

www.elsevier.com/locate/ipl

Building Cartesian trees from free trees with k leaves

Brian C. Dean ∗, Raghuveer Mohan

Clemson University, School of Computing, Division of Computer Science, Box 340974, Clemson, SC 29630, United States

a r t i c l e i n f o a b s t r a c t

Article history:Received 28 August 2010Received in revised form 25 February 2013Accepted 28 February 2013Available online 5 March 2013Communicated by B. Doerr

Keywords:Cartesian treeAdaptive algorithmsRange queriesData structures

One can build a Cartesian tree from an n-element sequence in O (n) time, and froman n-node free tree in O (n log n) time (with a matching worst-case lower bound inthe comparison model of computation). We connect these results together by describinga Cartesian tree construction algorithm based on a “bitonicity transform” running inO (n log k) time on a free tree with k leaves, noting that a path is the special case of a treewith just 2 leaves. We also provide a matching worst-case lower bound in the comparisonmodel.

© 2013 Elsevier B.V. All rights reserved.

1. Introduction

One can define a Cartesian tree from either an n-element sequence or an edge-weighted n-node free tree.As shown in Fig. 1(a), we define the Cartesian tree aris-ing from a sequence A1 . . . An by placing its minimumelement Ai at the root1; its left and right subtrees arerecursively defined to be Cartesian trees of the subse-quences A1 . . . Ai−1 and Ai+1 . . . An . The Cartesian tree inthis case is a heap-ordered binary tree whose in-ordertraversal yields the original sequence A1 . . . An . We definethe Cartesian tree of a free tree T similarly, as shown inFig. 1(b). The root node of the Cartesian tree correspondsto the edge e of minimum weight in T , and its two chil-dren are Cartesian trees of the subtrees into which T splitsupon removal of e. Internal nodes in the Cartesian tree cor-respond to edges in T , while leaves in the Cartesian treecorrespond to nodes in T . Cartesian trees have a variety ofalgorithmic applications, mostly due to their use in relating

* Corresponding author.E-mail address: [email protected] (B.C. Dean).

1 For simplicity, let us assume throughout this paper that all numbersin our input are distinct. It is easy to extend the concepts and results inour discussion to the general case with duplicates present.

0020-0190/$ – see front matter © 2013 Elsevier B.V. All rights reserved.http://dx.doi.org/10.1016/j.ipl.2013.02.014

range minimum queries (RMQs) with lowest common ances-tor (LCA) queries (see, e.g., [4,11]). In both a sequence anda free tree, the answer to an RMQ along a subsequence orsubpath corresponds to the answer of an LCA query in theCartesian tree, as indicated in Fig. 1.

In this work, we address the problem of building aCartesian tree. One can easily build a Cartesian tree inO (n) time from an n-element sequence and in O (n log n)

time from an n-node tree. Moreover, there is a match-ing Ω(n log n) worst-case lower bound on the worst-caseconstruction time of a Cartesian tree from a free tree inthe comparison model, since the Cartesian tree of a star-shaped tree is a depth-n sorted path (see Fig. 2(a)), sothe process of Cartesian tree construction can be used tosort. We connect the dots between these two cases, giv-ing an O (n log k) algorithm for Cartesian tree constructionfrom an n-node free tree with k leaves (and we provide amatching lower bound in the comparison model). Such analgorithm could be termed an “adaptive” algorithm withrespect to k, in the same manner as adaptive sorting al-gorithms (see, e.g., [7]) gracefully scale in running timebetween O (n) and O (n log n) depending on some auxiliaryparameter beyond just the problem size n that character-izes the intrinsic hardness of an instance (e.g., number ofinversions).

Page 2: Building Cartesian trees from free trees with k leaves

346 B.C. Dean, R. Mohan / Information Processing Letters 113 (2013) 345–349

Fig. 1. Examples of (a) the Cartesian tree of a sequence A1 . . . An and (b) the Cartesian tree of a free tree T . In both cases, we have highlighted thecorrespondence between a range minimum query along the path from x to y and the lowest common ancestor of x and y in the Cartesian tree (bothshown in bold).

Fig. 2. Sorting via Cartesian tree construction from (a) a star, and (b) a spider with k sorted legs of length n/k. Note that the designation between left andright children is not particularly relevant when building a Cartesian tree from a free tree, so there are many path-shaped Cartesian tree shapes that couldbe valid above.

2. Background

The Cartesian tree of a sequence was initially intro-duced by Vuillemin [12]. It is a close relative of thetreap [1], another hybrid between a binary heap and abinary search tree. Gabow et al. [9] first showed how tobuild a Cartesian tree from a sequence in O (n) time us-ing a simple inductive approach: starting with a Cartesiantree representing the sequence A1 . . . Ai−1, we obtain theCartesian tree representing A1 . . . Ai by inserting Ai at thebottom of the right spine and rotating it upward untilwe have restored the heap property. This approach spendsonly 2 units of work per element, 1 when it is insertedand another 1 later on when it is potentially rotated offthe right spine permanently. Bender and Farach-Colton [2]give a particularly clear description of the use of Carte-sian trees in relating RMQ problems in sequences with LCAproblems. In particular, they give a simple approach forsolving either problem with O (n) preprocessing time andO (1) query time (a result first achieved by Harel and Tar-jan [10]). Reductions between LCA and RMQ problems arealso described in the early literature in [3] and [9].

Cartesian trees of free trees were introduced byChazelle [5] and then subsequently rediscovered by De-maine et al. [6], who note that the Ω(n log n) comparison-

based lower bound on their worst-case construction timeapplies even to trees of bounded degree, and also describehow to build a Cartesian tree from a free tree in only O (n)

time after first sorting its edge weights as a preprocess-ing step. It is useful to note that the Cartesian tree of afree tree T reflects precisely the hierarchical structure ofthe merging operations performed by Kruskal’s minimum(or rather maximum, in this case) spanning tree algorithm,when executed on T . For this reason, the authors suggestthat the term “Kruskal tree” might also be well-suited fordescribing such a Cartesian tree.

3. Lower bounds

Theorem 1. In the comparison model, there is an Ω(n log k)

worst-case lower bound for building a Cartesian tree from ann-node free tree with k leaves.

Proof. The Cartesian tree resulting from a “spider” withk incident sorted paths of length n/k is one long sortedpath (Fig. 2(b)). Hence, any algorithm for Cartesian treeconstruction from an n-node free tree with k leaves canbe used to solve the more fundamental problem of merg-ing k sorted lists on n total elements. Moreover, this k-way merging problem has an Ω(n log k) worst-case lower

Page 3: Building Cartesian trees from free trees with k leaves

B.C. Dean, R. Mohan / Information Processing Letters 113 (2013) 345–349 347

Fig. 3. In (a), we show the process of contracting an outflanked subsegment down to a single marked node x, after which we build a Cartesian tree (inwhich x will be a leaf) and then re-expand x by replacing it with a Cartesian tree of the contracted subsegment. The end result is a Cartesian tree of ouroriginal tree. In (b), we show how to reduce any segment to a bitonic segment by contracting outflanked subsegments down to marked nodes.

bound in the comparison model, since we can encoden/k independent k-element sorting problems (requiringΩ(n

k k log k) = Ω(n log k) worst-case time) into a k-waymerging problem with k lists each of size n/k. To do this,we regard the elements of our n/k sorting problems tobe the columns of a k × n/k matrix A, whose rows aretreated as being in sorted order — that is, two elementsnot initially in the same column are compared by theirinitial column indices, rather than by their values. Only el-ements in the same initial column are compared by value.By merging the rows of A together, this effectively solveseach of our initial sorting problems. �4. Exploiting bitonicity

Let us call a path p through a free tree T a segment ifits endpoints are either leaves or nodes of degree � 3, andits interior nodes are all of degree 2. A tree with k leaveshas at most k − 2 nodes of degree � 3. Since segmentsare disjoint, we could replace each segment with an edgeto obtain a tree with O (k) nodes and edges, and hencethere are O (k) segments. We say a sequence is bitonic ifit decreases to its minimum value, then increases, and wesay a free tree T is bitonic if the sequence of edge weightsalong each of its segments is bitonic.

Theorem 2. The Cartesian tree of a free tree with k leaves canbe constructed in O (n log k) time.

To prove this theorem, we note that a bitonic free treewith k leaves can easily be converted to a Cartesian treein O (n log k) time, since it takes only O (n log k) time to

sort all its edge weights, after which we apply the ap-proach of Demaine et al. [6]. Sorting the edge weights isnothing more than an instance of the k-way merging prob-lem, which can be solved in O (n log k) time, for example,by using a binary heap to repeatedly select and removethe minimum leading element from the k lists in O (log k)

time per element. We now argue that the problem of con-structing a Cartesian tree of any free tree can be reducedto the bitonic case in only O (n) time, thereby completingour construction algorithm.

We begin by defining an operation known as a con-traction, whereby we replace a subsegment (a contiguouspiece of a segment) by a single node, as shown in Fig. 3(a).Whenever we contract a subsegment down to a singlenode, we mark the node to record the subsegment it rep-resents, in order to facilitate re-expansion of the node inthe future.

Only certain subsegments are acceptable to contract. Letus call a subsegment proper if it is strictly contained withina larger segment (that is, containing neither of the twoendpoint edges of its larger segment). We call a propersubsegment S outflanked if the weights of the edges in Sare all larger than the weights of the two edges flanking S .As shown in Fig. 3(a), contraction of a proper subsegmentS is a natural operation to consider in conjunction withCartesian tree construction:

Lemma 1. Consider any tree T and an outflanked subsegment S.If we contract S to a marked node x, build a Cartesian tree (inwhich x will be a leaf), and then re-expand x by replacing itwith a Cartesian tree for S, then we obtain a Cartesian tree ofthe original tree T .

Page 4: Building Cartesian trees from free trees with k leaves

348 B.C. Dean, R. Mohan / Information Processing Letters 113 (2013) 345–349

Proof. Let us build a Cartesian tree according to thetop-down recursive definition, removing in each step aminimum-weight edge from the free tree to use as theroot of the Cartesian tree, then breaking the free tree intotwo pieces that are recursively processed. Note that S mustremain fully intact during this process until both of itsflanking edges have been processed, leaving a subtree ofthe resulting Cartesian tree containing precisely the Carte-sian tree of S . If we had instead contracted S to a markednode x before building our Cartesian tree, then the top-down procedure would have produced the same result,except with x in the same location formerly occupied bythe subtree corresponding to S . It is therefore equivalentto consider S as being unified into a single node x dur-ing the Cartesian tree construction process, after which xis replaced with a Cartesian tree for S . �

Using this insight, we can effectively reduce every seg-ment S in our tree into a bitonic segment by contractingoutflanked subsegments, as shown in Fig. 3(b). To do this,scan S from one endpoint up to its minimum-weight edge,keeping track of the minimum edge weight along the way.Outflanked subsegments lie between edges that reset thisrunning minimum value. We then repeat the process fromthe other endpoint of the segment. By contracting all out-flanked subsegments, S becomes bitonic. Since Cartesiantrees of sequences can be computed in linear time, onlyO (n) total time is required to contract and later re-expandall outflanked segments.

5. A pointer-based implementation

All of the steps in our construction algorithm can run inthe simplistic pointer machine model except for the O (n)

algorithm of Demaine et al. [6] that builds a Cartesian treefrom a free tree once its edges are sorted (this algorithmuses a decremental connectivity structure that depends onfeatures of the RAM model). However, we note that sinceour ultimate target running time is only O (n log k), we canuse a slightly weaker approach for decremental connectiv-ity to obtain a purely pointer-based solution for Cartesiantree construction.

To describe our construction, let us first review how tobuild a Cartesian tree from a free tree T in O (n log n) timein a simple top-down manner. This is done by repeatedlyremoving edges from T (in increasing order of weight),creating a forest F whose components are repeatedly splitapart until F has been decomposed into n isolated nodes.We store the edge weights in F in a binary heap H , solocating the minimum edge at each step takes O (log n)

time. As we decompose F , we simultaneously build outa Cartesian tree whose leaves correspond to componentsin F . Each time we split a component C in F on someedge e, we expand the corresponding leaf in our Carte-sian tree by replacing it with a node corresponding to ehaving children corresponding to the two sub-componentsobtained by splitting C on e. To implement this efficiently,we need to use a decremental connectivity data structurethat can quickly identify the component of F to whicheach newly-extracted edge from H belongs. This is easy todo in O (log n) amortized time per split operation, by the

standard trick (e.g., see [8]) of relabeling the edges in thesmaller component created by the split (we can determinewhich is smaller by traversing both in parallel, stoppingwhen one traversal terminates).

In order to implement the approach above in onlyO (n log k) time, we work with a “compressed” forest hav-ing only O (k) nodes and edges, so every heap operationon H and every split takes only O (log k) (amortized) time.The compressed forest is obtained by initially applying thebitonic transformation described above, so each segmentin our tree effectively consists of a bitonic sequence ofedge weights. We then treat each bitonic segment as twomonotonic segments, and we represent each monotonicsegment s as only a single edge in our compressed forest,labeled with the minimum edge weight in s. Monotonic-ity is crucial, since each time we remove the minimumedge from H and perform a split, this does not create anynew edges in the compressed forest — it merely removesthe endpoint edge of some existing segment, resulting in achange in the label of the edge representing that segment(and a corresponding O (log k) time update to that edge’srecord in H). The component in the compressed forest thatis split is relabeled the same way as above, by relabelingthe smaller piece after the split (smaller in the compressedsense). Just as above, node labels are used to identify thecomponent to which an edge belongs, when it is chosenfrom H . A minor but noteworthy difference from the ap-proach above is that when a split occurs, this may notremove an edge from the compressed forest if the com-pressed edge represents a segment containing more thanone original edge. In this case, after the split there willbe the same total number of edges and one new nodeamong the two resulting components in the compressedforest. Each time we relabel a node, the component of thatnode therefore shrinks from its original size of x nodes toat most x/2 + 1 nodes, giving a total bound of O (n log k)

work spent relabeling all nodes. The total running time forour pointer-based construction algorithm is therefore onlyO (n log k).

Acknowledgements

This research was partially supported by NSF awardCCF-0845593.

References

[1] Cecilia R. Aragon, Raimund Seidel, Randomized search trees, Algo-rithmica 16 (1996) 464–497.

[2] Michael A. Bender, Martín Farach-Colton, The LCA problem revisited,in: Proceedings of Latin American Theoretical Informatics (LATIN),2000, pp. 88–94.

[3] Omer Berkman, Dany Breslauer, Zvi Galil, Baruch Schieber, UziVishkin, Highly parallelizable problems (extended abstract), in: Pro-ceedings of the 21st Annual Symposium on Theory of Computation(STOC), 1989, pp. 309–319.

[4] Prosenjit Bose, Anil Maheshwari, Giri Narasimhan, Michiel Smid, Nor-bert Zeh, Approximating geometric bottleneck shortest paths, Com-putational Geometry 29 (3) (2004) 233–249.

[5] Bernard Chazelle, Computing on a free tree via complexity-preserving mappings, Algorithmica 2 (1) (1987) 337–361.

[6] Erik D. Demaine, Gad M. Landau, Oren Weimann, On Cartesiantrees and range minimum queries, in: Proceedings of the 36th

Page 5: Building Cartesian trees from free trees with k leaves

B.C. Dean, R. Mohan / Information Processing Letters 113 (2013) 345–349 349

International Colloquium on Automata, Languages and Programming(ICALP), 2009, pp. 341–353.

[7] Vladmir Estivill-Castro, Derrick Wood, A survey of adaptive sortingalgorithms, ACM Computing Surveys 24 (4) (1992) 441–476.

[8] Shimon Even, Yossi Shiloach, An on-line edge-deletion problem, Jour-nal of the ACM 28 (1) (1981) 1–4.

[9] Harold N. Gabow, Jon Louis Bentley, Robert E. Tarjan, Scaling and re-lated techniques for geometry problems, in: Proceedings of the 16th

Annual ACM Symposium on Theory of Computing (STOC), ACM Press,1984, pp. 135–143.

[10] Dov Harel, Robert E. Tarjan, Fast algorithms for finding nearest com-mon ancestors, SIAM Journal on Computing 13 (2) (1984) 338–355.

[11] David M. Neto, Efficient cluster compensation for Lin–Kernighanheuristics, PhD thesis, University of Toronto, 1999.

[12] Jean Vuillemin, A unifying look at data structures, Communicationsof the ACM 23 (4) (1980) 229–239.