space-time trade-offs for some ranking and searching queries

5
Information Processing Letters 79 (2001) 237–241 Space-time trade-offs for some ranking and searching queries Adrian Dumitrescu a,, William Steiger b a Applied Mathematics and Statistics, SUNY, Stony Brook, NY 11794-3600, USA b Computer Science, Rutgers University, Piscataway, NJ 08854, USA Received 19 April 2000; received in revised form 13 October 2000 Communicated by F.B. Schneider Abstract We study space/time tradeoffs for querying some combinatorial structures. In the first, given an arrangement of n lines in general position in the plane, a query for a real number t asks about Rank(t), the number of vertices of the arrangement with x-coordinates t . We show that for K = O(n/log n), after a preprocessing step that uses space S = O(n 2 /(K log K)) the query can be answered in time O(n log K). The second query involves the Cartesian sum of vectors a = (a 1 ,...,a n ) and b = (b 1 ,...,b n ). For a given real t , it asks about Rank(t), the number of sums a i + b j which are t . We show that for some positive constant c and K c(log n)/(log log n), after a preprocessing step that uses space S = O(n 2 /K 2 ), the query may be answered in time O((n/K) log K). Both results fit neatly between two obvious extremes. 2001 Elsevier Science B.V. All rights reserved. Keywords: Algorithms; Ranking; Searching; Computational geometry 1. Introduction We examine two problems. In the first, given lines 1 ,..., n , i ={(x,y): y = m i x + b i }, we say they are in general position if the set V ={ i j ,i<j } of vertices has N = ( n 2 ) distinct elements. We write y i (t) = m i t + b i for the intercepts of the lines on the vertical line x = t . We consider the following two queries. Given a real number t , the ranking query asks for Rank(t), the number of vertices in the left half- plane x t . The search query asks whether there is a vertex in V with x -coordinate equal to t . * Corresponding author. Current address: University of Wiscon- sin, Department of Electrical Engineering and Computer Science, EMS Building, P.O. Box 784, Milwauke, WI, 53201-0784, USA. E-mail addresses: [email protected] (A. Dumitrescu), [email protected] (W. Steiger). To answer such queries we are allowed to pre- process the lines and store any computed results us- ing space S . The goal now is to be able to answer ranking queries as efficiently as possible, say in time T (S). At one end of the spectrum we use minimal space S = O(n) with which we store the data (m i ,b i ) in the order of decreasing slopes. Then we answer the query for t in time T = O(n log n) by counting I(t), the number of inversions in the permutation that sorts y i (t), the intercepts at x = t . It is familiar [3] that I(t) = Rank(t). At the other extreme we use space S = O(n 2 ) to store the x -coordinates from V in in- creasing order, say x 1 ··· x N . The query about Rank(t) can now be answered by binary search in T = O(log n) time. One question [6] is whether, using space S = o(n 2 ), the ranking query can be answered in time T = o(n log n); i.e., is there anything between 0020-0190/01/$ – see front matter 2001 Elsevier Science B.V. All rights reserved. PII:S0020-0190(00)00226-X

Upload: adrian-dumitrescu

Post on 02-Jul-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Information Processing Letters 79 (2001) 237–241

Space-time trade-offs for some ranking and searching queries

Adrian Dumitrescua,∗, William Steigerba Applied Mathematics and Statistics, SUNY, Stony Brook, NY 11794-3600, USA

b Computer Science, Rutgers University, Piscataway, NJ 08854, USA

Received 19 April 2000; received in revised form 13 October 2000Communicated by F.B. Schneider

Abstract

We study space/time tradeoffs for querying some combinatorial structures. In the first, given an arrangement ofn linesin general position in the plane, a query for a real numbert asks aboutRank(t), the number of vertices of the arrangementwith x-coordinates� t . We show that forK = O(n/logn), after a preprocessing step that uses spaceS = O(n2/(K logK))

the query can be answered in time O(n logK). The second query involves the Cartesian sum of vectorsa = (a1, . . . , an) andb = (b1, . . . , bn). For a given realt , it asks aboutRank(t), the number of sumsai + bj which are� t . We show that for some

positive constantc andK � c(logn)/(log logn), after a preprocessing step that uses spaceS = O(n2/K2), the query may beanswered in time O((n/K) logK). Both results fit neatly between two obvious extremes. 2001 Elsevier Science B.V. Allrights reserved.

Keywords: Algorithms; Ranking; Searching; Computational geometry

1. Introduction

We examine two problems. In the first, given lines�1, . . . , �n, �i = {(x, y): y = mix + bi}, we say theyare in general position if the setV = {�i ∩ �j , i < j }of vertices hasN = (

n2

)distinct elements. We write

yi(t) = mit + bi for the intercepts of the lines on thevertical line x = t . We consider the following twoqueries. Given a real numbert , the ranking query asksfor Rank(t), the number of vertices in the left half-planex � t . The search query asks whether there is avertex inV with x-coordinate equal tot .

* Corresponding author. Current address: University of Wiscon-sin, Department of Electrical Engineering and Computer Science,EMS Building, P.O. Box 784, Milwauke, WI, 53201-0784, USA.

E-mail addresses: [email protected] (A. Dumitrescu),[email protected] (W. Steiger).

To answer such queries we are allowed to pre-process the lines and store any computed results us-ing spaceS. The goal now is to be able to answerranking queries as efficiently as possible, say in timeT (S). At one end of the spectrum we use minimalspaceS = O(n) with which we store the data(mi, bi)

in the order of decreasing slopes. Then we answer thequery for t in time T = O(n logn) by countingI (t),the number of inversions in the permutation that sortsyi(t), the intercepts atx = t . It is familiar [3] thatI (t) = Rank(t). At the other extreme we use spaceS = O(n2) to store thex-coordinates fromV in in-creasing order, sayx1 � · · · � xN . The query aboutRank(t) can now be answered by binary search inT = O(logn) time. One question [6] is whether, usingspaceS = o(n2), the ranking query can be answeredin time T = o(n logn); i.e., is there anything between

0020-0190/01/$ – see front matter 2001 Elsevier Science B.V. All rights reserved.PII: S0020-0190(00)00226-X

238 A. Dumitrescu, W. Steiger / Information Processing Letters 79 (2001) 237–241

the extremes? In the next section we show how to an-swer queries in timeT = O(n logK), only using spaceS = O(n2/(K logK), for K = O(n/logn), and thusgive an affirmative answer as long asK = �(logn).For this purpose we store partial order informationof the lines at an equally spaced sample of all thex-coordinates of the intersection points. On the otherhand, the trade-off is quite incomplete, and it is evenopen as to whether spaceS = O(n2−ε) (with 0 < ε <

1) gives any advantage over spaceS = O(n) in settlingranking queries. These results extend in a straightfor-ward way to the search version.

In the second problem we are given vectorsa =(a1, . . . , an) andb = (b1, . . . , bn) and we consider theCartesian sum matrixΣ = (σij ), σij = ai + bj . Theranking query fort asks forRank(t), the number ofsums inΣ that are� t , and the searching query askswhether there is a pair(i, j) with ai + bj = t . Wepreprocessa andb and store any desired results usingspaceS. Now, given a queryt , we want to computeRank(t) as efficiently as possible, say in timeT . Inone extreme, using space O(n) in which we store theelements ofa andb, each in increasing order, we caneasily countRank(t) in time T = O(n). At the otherextreme, by computing all elements ofΣ and storingthem in sorted order using spaceS = O(n2), thequeries can be settled in timeT = O(logn). A naturalquestion [2] is whether there is anything between theseextremes; i.e., whether using spaceS = o(n2), wemay answer a query in timeT = o(n). In Section 3we show that after preprocessing the Cartesian sumand using spaceS = O(n2/K2), the query may beanswered in timeT = O((n/K) logK) as long asK � c(logn)/(log logn), for some positive constantc. Thus we give an affirmative answer by selectingK = c(logn)/(log logn). The question of whetherS = O(n2−ε) (with 0 < ε < 1) allows T = o(n) isopen.

2. The rank/search query for line arrangements

Given lines{�1, . . . , �n}, �i = {(x, y): y = mix +bi}, let V = {�i ∩ �j } denote the vertex set andX = {x1, . . . , xN }, the x-coordinates of theN = (

n2

)vertices, in sorted order. We writeyi(t) = mit + bi forthe intercepts of the lines on the vertical linex = t .

We begin by observing that if the space is to beo(n2), we could only compute and save a sparse subsetof x1 � · · · � xN . So take an integerK and supposethat preprocessing computes

U = {ui = xiK, i = 1, . . . ,N/K},the x-coordinates of the stored vertices. Now, givena query t , we could find the largestj : uj � t inT = O(log(N/K)) and know that

Rank(t) = jK + ∣∣{xi ∈ [uj ,uj+1): xi � t}∣∣.

Moreover we could count|{xi ∈ [uj ,uj+1): xi � t}|in timeT = O(K logn) using the Bentley–Ottman linesweep [1], so if we wantT = o(n logn), we wouldneedK = o(n).

On the other hand, the line-sweep algorithm re-quires the sorted order of intercepts atuj . To sort{yi(uj ), i = 1, . . . , n} once the query is presentedwould already take too much time, so we would needto be able to obtain this sorted order from the pre-processing data in o(n logn). Finally we observe thatwe could not store theN/K permutations inS = o(n2)

because of the above requirement onK. Our solutionis to compute and store the count at a sparse set of ver-tices, and a partial order information on the set of linesat an even sparser (sub-linear) set of vertices.

Let K = O(n/logn) be an integer parameter to bespecified later, and for simplicity assume thatK|n. Fora positive integerd , a d-sample of X is the (sorted)sublistxid, i � 1. Given integersδ < ∆, assume thatδ|∆ and∆|n. If we have aδ-sample and a∆-sampleof X, theδ-sample forms a finer subdivision of the setX, and the∆-sample is a subset of theδ-sample.

Following Cole et al. [3], aK-grouping of the lines{�1, . . . , �n} at x = u is a permutationπu of [n] suchthat for 1� i < j � n/K , 1� r, s � K,

yπu((i−1)K+r)(u) � yπu((j−1)K+s)(u);that isπu partitions the lines into groups of sizeK, andfor 1 � i < j � n/K , all y-coordinates of the lines ingroup i at x = u are smaller than or equal to they-coordinates of the lines in groupj at x = u. Writeui, i � 1 for the points of the∆-sample ofX, andvj , j � 1 for the points of theδ-sample ofX. We take∆ = ∆(ui) = nK logK andδ = δ(vj ) = K logK.

At each point of the∆-sample store aK-grouping(of sizen) of the lines at that point (the sorted orderis a possible choice). At each point of theδ-sample

A. Dumitrescu, W. Steiger / Information Processing Letters 79 (2001) 237–241 239

store the number of intersection points to the left ofthat point. The space used by the∆-sample and theδ-sample are

N

∆n = O

(n2

K logK

)and

N

δ1 = O

(n2

K logK

),

respectively. So the total space is O(n2/(K logK)).The following lemma will be useful for our algorithm.It shows that the information about the vertical order-ing of the lines at a sample pointx = v can be obtainedquickly from the partial order information at a nearbysample pointx = u.

Lemma 2.1. Consider an interval [u,v] on the x-axissuch that the vertical strip u � x � v contains at mostnK logK vertices in V . Given a K-grouping of thelines at x = u, one can compute the (exact) order ofthe lines at x = v in O(n logK) time.

Proof. This is a modification of a basic element of [3],with a much simpler proof and may be of independentinterest. The algorithm has two phases. In the firstphase, we maintain a grouping of the lines atx = v asa linked list of pointers to groups. LetL2 denote thislist. Each group is stored in an array of size 2K and hasa size betweenK and 2K during this phase. Elementsin each group are they-coordinates of the lines in thatgroup atx = v. The y-coordinate of the lowest linein the group (atx = v) is stored in the first positionof its array (called the minimum of that group). In thesecond phase the exact order of the lines atx = v iscomputed from the previous representation.

First phase scans the lines in the order given bytheK-grouping atx = u. Let L1 denote this list. Thefirst group ofL1 generates the first group ofL2 (they-coordinates atx = v of theK lines in the first groupof L1). At the current step, a line ofL1 is insertedinto the appropriate group ofL2, and assume a list ofgroupsL2 is available atx = v. When inserting inL2the current linel of L1, its y-coordinate atx = v iscompared to they-coordinate of the lowest line in thelast group ofL2 (having the largest minimum). Thenthe search moves sequentially towards the head of thelist L2, until a proper group is found to accommodatethe new line. The linel (its y-coordinate atx = v)is inserted in the first free position (end of group)in the first group which was found having a smallerminimum value. If the search reaches the first group of

L2, l is inserted there and the minimum is recomputed(in constant time). When a group becomes too large(i.e., has size 2K), it is split into two groups of sizeK (a new group being created), which are relinked atthe same position inL2. This is done by selecting themedian in the large group and moving the larger half(K keys) to the first half of a new array of size 2K.

The second phase computes the exact order of thelines at x = v, from the list of groups inL2 (ofsizes betweenK and 2K), by sorting each group andappending the results.

To analyze the complexity of the first phase, weaccount for the time to split large groups when creatingnew groups and for the time to search for the correctgroup when inserting new lines atx = v. There areat most O(n/K) groups created and each takes O(K)

(the linear time selection algorithm is employed). Thisamounts to O(n). To account for the search time,assume thatcl comparisons have been made to locatea group when inserting the current linel at the newposition x = v. Then at leastcl − 1 groups of sizeat leastK lines have been skipped. Writei for theindex of the line in theK-grouping atx = u, andwrite j for the index of the line in the list obtainedby concatenating all previous groups (fromL2) andthe located group (fromL2) at that time. Theni − j �(cl − 1)K, hencel has at least(cl − 1)K intersectionswith other lines of smaller indexk < i at x = u (seeFig. 1). We assumed that there are at mostnK logK

vertices in the stripu � x � v, which implies

Fig. 1. Counting intersections in a vertical strip.

240 A. Dumitrescu, W. Steiger / Information Processing Letters 79 (2001) 237–241

∑l

(cl − 1)K � nK logK.

This bounds the complexity of the search, because

COST=∑l

cl � n logK + n.

The second phase takes(n/K)O(K logK) =O(n logK) time. Therefore the total complexity of thealgorithm is O(n logK) which proves the lemma.✷

The query algorithm is described next.

(1) Binary search with the query valuet for the ∆-interval [ui, ui+1) and theδ-interval [vj , vj+1)

containingt . Note that theδ-interval is a subin-terval of the∆-interval. This step takes O(logn)

time.(2) Compute the (exact) ordering of the lines atx =

vj from the ordering atx = ui in time O(n logK)

(as described in Lemma 2.1).(3) Perform a sweep with a vertical line fromx = vj

to x = t using the line sweep algorithm of Bentleyand Ottmann [1] (see also [5]).

The number of intersection pointsk in the verticalstrip vj � x � t satisfiesk � K logK. Since the timeto initialize the sweep is O(n) and processing eachevent during the sweep takes O(logn), the verticalsweep line reachesx = t in

O(n+ k logn) = O(n+ K logK logn)

time. ForK = O(n/logn), this becomes

O(n + (n/logn)(logn)(logK)

) = O(n logK).

Therefore the total time to process a query isO(n logK), and we proved

Theorem 2.2. Using preprocessing space O(n2/

(K logK)), the ranking query for line arrangementscan be answered in time O(n logK), K an integer sat-isfying K = O(n/logn).

Write S(n) for the space andT (n) for the querytime. ThenS(n)T (n) = O(n3/K). In particular, byselectingK = logn, we obtain

S(n) = O

(n2

logn log logn

)= o(n2),

T (n) = O(n log logn) = o(n logn).

Remark. It would be interesting to decide if usingspaceS(n) = O(n2−ε), a query can be answered intime T (n) = o(n logn), for some fixedε > 0. Moreinteresting from a practical point would be to obtainT (n) = o(n) with spaceS(n) = o(n2).

3. The rank/search query for Cartesian sums

For two vectorsa = (a1, . . . , an) andb = (b1, . . . ,

bn), the Cartesian sum is the set of then2 sums

Σ = (σij ), σij = ai + bj .

Assume that both vectors are in sorted order. For theranking query we want to count

Rank(t) = ∣∣{σij ∈ Σ: σij � t}∣∣

using storageS, and in timeT (S). It will be usefulto describe the algorithm for the case whenS = O(n).Call it ALG A: We start with RANK= 0, i = n andj = 1. In each step, compareσij with t . If σij � t ,we addi to COUNT and increasej by one; otherwisewe decreasei by one. The process continues untilj < 1 or i > n. Clearly it uses at most 2n − 1 = O(n)

comparisons.To useS = o(n2) storage we will choose an integer

K > 0, K|n, and partitionΣ into (n/K)2 squaresubmatricesΣrs of sizeK2 each, where for 1� r, s �n/K,

Σrs = {σij : (r − 1)K < i � rK,

(s − 1)K < j � sK}.

If the sorted order of the elements inΣrs were known,the integer

COUNT(r, s) = ∣∣{σij ∈ Σrs : σij � t}∣∣

could be computed in time O(logK) using binarysearch on an array of sizeK2. The following algorithmto answer the ranking query is based onALG A,above.

Start with RANK= 0, r = n/K, ands = 1. In eachstep we compareσrK,sK to t . If σrK,sK � t we addrK2 to RANK and increases by one; otherwise weadd COUNT(r, s) to RANK and decreaser by one.The process continues untilr < 1 or s > n/K. Thecomplexity is O(n/K) comparisons and O(n/K) callsof COUNT(r, s) which, assumingΣrs is sorted, gives

T = O((n/K) logK

).

A. Dumitrescu, W. Steiger / Information Processing Letters 79 (2001) 237–241 241

We use a coding device to enable us to runCOUNT(r, s) in time O(logK) at query time. Inpreprocessing, for each of the(n/K)2 submatricesΣrs of sizeK2, we store a pointer to an array withK2 pairs of indices, each with values in{1, . . . ,K}.These pairs describe the sorted order of the valuesin the corresponding sub-matrixΣrs . For example,(1,1), (1,2), (2,1), (2,2) says that in the 2 by 2 ma-trix Σ , σ11 � σ12 � σ21 � σ22. The set of arrays ofsizeK2 can be stored in a tableM with R rows, whereR = O(K8K) (there are at most this number of dis-tinct orderings of theK2 elements inΣrs , see [4]).The space taken to storeM is � 2K2R = O(K8K+2).We store a pointer intoM for eachΣrs , so the totalspace for the list of pointers and the table is

(n

K

)2

+ O(K8K+2).

For some appropriate constantc > 0, as long asK � c(logn)/(log logn), the space is bounded byO(n2/K2), and we proved

Theorem 3.1. Using preprocessing space S = O(n2/

K2), the ranking query for Cartesian sums can beanswered in time T = O((n/K) logK), K an integersatisfying K � c(logn)/(log logn), for some suitablepositive constant c.

Write S(n) for the space andT (n) for the querytime. Then

S(n)T (n) = O((n/K)3 logK

).

In particular, by selectingK = c(logn)/(log logn),we obtain

S(n) = O

(n2(log logn)2

(logn)2

)= o(n2),

T (n) = O

(n(log logn)2

logn

)= o(n).

Remark. One can takec = 1/5 in the above calcula-tion.

References

[1] L. Bentley, T. Ottmann, Algorithms for reporting and countinggeometric intersections, IEEE Trans. Comput. 28 (1979) 643–647.

[2] B. Chazelle, Personal communication.[3] R. Cole, J. Salowe, W. Steiger, E. Szemerédi, An optimal time

algorithm for slope selection, SIAM J. Comput. 18 (1989) 792–810.

[4] M. Fredman, How good is the information theory bound insorting?, Theoret. Comput. Sci. 1 (1976) 355–361.

[5] F. Preparata, M. Shamos, Computational Geometry, An Intro-duction, Springer, New York, 1985.

[6] M. Sharir, Personal communication.