[ieee comput. soc 14th international conference on data engineering - orlando, fl, usa (23-27 feb....

Compressing Relations and Indexes

Jonathan Goldstein Raghu Ramakrishnan Uri ShaftComputer Sciences Department

University of Wisconsin-Madison

Abstract

We propose a new compression algorithm that is

tailored to database applications. It can be applied to

a collection of records, and is especially e�ective for

records with many low to medium cardinality �elds and

numeric �elds. In addition, this new technique sup-

ports very fast decompression.

Promising application domains include decision sup-

port systems (DSS), since \fact tables", which are by

far the largest tables in these applications, contain many

low and medium cardinality �elds and typically no text

�elds. Further, our decompression rates are faster than

typical disk throughputs for sequential scans; in con-

trast, gzip is slower. This is important in DSS appli-

cations, which often scan large ranges of records.

An important distinguishing characteristic of our

algorithm, in contrast to compression algorithms pro-

posed earlier, is that we can decompress individual tu-

ples (even individual �elds), rather than a full page

(or an entire relation) at a time. Also, all the infor-

mation needed for tuple decompression resides on the

same page with the tuple. This means that a page can

be stored in the bu�er pool and used in compressed

form, simplifying the job of the bu�er manager and

improving memory utilization.

Our compression algorithm also improves index

structures such as B-trees and R-trees signi�cantly by

reducing the number of leaf pages and compressing in-

dex entries, which greatly increases the fan-out. We

can also use lossy compression on the internal nodes

of an index.

1 Introduction

Traditional compression algorithms such as Lempel-Ziv [10] (the basis of the standard gzip compressionpackage), require uncompressing a large portion of the

�le even if only a small part of that �le is required. Forexample, if a relation containing employee records iscompressed page-at-a-time, as in some current DBMSproducts, a page's worth of data must be uncompressedto retrieve a single tuple. Page-at-a-time compressionalso leads to compressed \pages" of varying length thatmust be somehow packed onto physical pages, and themapping between the original pages/records and thephysical pages containing compressed versions must bemaintained. In addition, compression techniques thatcannot decompress individual tuples on a page storethe page decompressed in memory, leading to poorerutilization of the bu�er pool (in comparison to storingcompressed pages).

We present a compression algorithm that over-comes these problems. The algorithm is simple, andcan be easily added to the �le management layer of aDBMS since it supports the usual technique of identi-fying a record by a (pageid, slotid) pair, and requiresonly localized changes to existing DBMS code. Higherlayers of the DBMS code are insulated from the de-tails of the compression technique (obviously, queryoptimization needs to take into consideration the in-creased performance due to compression). In addition,this new technique supports very fast decompressionof a page, and even faster decompression of individualtuples on a page. Our contributions are:

Page level Compression. We describe a compres-sion algorithm for collections of records (and indexentries) that can essentially be viewed as a new page-level layout for collections of records (Section 2). It al-lows decompression at the level of a speci�ed �eld of aparticular tuple; all other proposed compression tech-niques that we are aware of require decompressing anentire page. Scenarios that illustrate the importanceof tuple-level decompression are presented in Section4.

Performance Study. We implemented all the algo-rithms presented in the paper, and applied them tovarious real and synthetic data sets. We measuredboth compression ratios and processing speed for de-compression. Section 4 contains a summary of the re-sults. (A very detailed presentation of the results canbe found in [6].)

The performance analysis underscores the impor-tance of compression in the database context. Currentsystems, in particular Sybase IQ, use a proprietaryvariant of gzip, applied page-at-a-time. (We thankClark French at Sybase IQ for giving us informationabout the use of compression in Sybase IQ.) We com-pare our results to a compression technique similar tothe Sybase IQ compression. (We used gzip on 64KBblocks). We typically get better compression ratios (ashigh as 88 to 1). Our compression and decompressiontechniques are much faster than gzip, and are evenfast enough for maintaining sequential I/O.

Application to B-trees and R-trees. We studythe application of our technique to index structures(e.g., B-trees and R-trees) in Section 3. We com-press keys on both the internal (where keys are hyper-rectangles) and leaf pages (where keys are either pointsor hyper-rectangles). Further, for R-trees we can choosebetween lossy and lossless compression in a way thatexploits the semantics of an R-tree entry; a capabil-ity that is not possible with other compression algo-rithms. The key represents a hyper-rectangle, and inlossy compression we can use a larger hyper-rectangleand represent it in a smaller space.

Multidimensional bulk loading algorithm. Wecan exploit a sort order over the data to gain bettercompression. B-tree sort orders can be utilized wellfor that purpose. We present a bulk loading algorithmthat essentially creates a sort order for R-trees. Thebulk loading algorithm creates the levels of an indexingstructure in bottom up order. If we only create theleaf level, we get a compressed relation. Creating thenext levels has little additional cost (typically, no morethan 10% the space cost of the leaf level). Thus, thetotal size of the compressed clustered index is muchless than the size of the original relation.

2 Compressing a relation

Our relation compression algorithm has two maincomponents. The �rst component is called page level

compression. It takes advantage of common informa-tion amongst tuples on a page. This common infor-

mation is called the frame of reference for the page.Using this frame of reference, each �eld of each tuplecan be compressed, sometimes quite signi�cantly; thusmany more tuples can be stored on a page using thistechnique than would be possible otherwise. The com-pression is done incrementally while tuples are beingstored, either at bulk-loading time or during run-timeinserts of individual tuples. This ensures that addi-tional tuples can �t onto a page, taking advantage ofthe space freed by compression. Section 2.1 describespage level compression in detail.

The second component of the relation compressionalgorithm is called �le level compression. This compo-nent takes a list of tuples (e.g., an entire relation) anddivides the list into groups s.t. each group can �t ona disk page using page level compression. Section 2.3describes �le level compression in detail.

The most important aspects of our compressiontechnique are:� Each compressed data page is independent of theother pages. Each tuple in each page can be decom-pressed based only on information found on the spe-ci�c page. Tuples, and even single �elds, can be de-compressed without decompressing the entire page,(let alone the entire relation).

� A compressed tuple can be identi�ed by a page-idand a slot-id in the same way that uncompressedtuples are identi�ed in conventional DBMSs.

� Since tuples can be decompressed independently,we can store compressed pages in the bu�er pool,without decompressing them. The way tuple-id'sare used does not change with our compression tech-nique. Thus, incorporating our compression tech-nique in an existing DBMS involves changes only tothe page level code and to the query optimizer.

� A compressed page can be updated dynamicallywithout looking at any other page. This means thata compressed relation can be updated without us-ing �le level compression. However, using �le levelcompression will result in better compression.

2.1 Page level compression: frames of

reference

Our basic observation is as follows: if we considerthe actual range of values that appear in a given col-umn on a given page, this is much smaller than therange of values in the underlying domain. For exam-ple, if the �rst column contains integers and the small-est value on the page in this column is 33 and thelargest is 37, the range (33, 37) is much smaller than

the range of integers that can be represented (withoutover ow). If we know the range of potential values, wecan represent any value in this range by storing justenough bits to distinguish between the values in thisrange. In our example, if we remember that only val-ues in the range (33, 37) can appear in the �rst columnon our example page, we can specify a given value inthis range by using only 3 bits: 000 represents 33, 001represents 34, 010 represents 35, and 011 represents 36and 100 represents 37.

Consider a set S of points and collect, from S,the minima and maxima for all dimensions in S. Theminima and maxima provide a frame of reference

F , in which all the points lie. For instance, ifS = f(511; 1001); (517; 1007); (514; 1031)g thenF = [511; 1001]� [517; 1031].

The frame of reference tells us the range of possiblevalues in each dimension for the records in the set S.For instance, the points along the X-axis only varybetween 7 values (511 to 517 inclusive), and the pointsalong the Y -axis vary between 31 values. Only 3 bitsare actually needed to distinguish between all the X

values that actually occur inside our frame of reference,and only 5 bits are needed to distinguish between Y

values. The set of points S would be represented usingthe following bit strings:S = f(000; 00000); (110; 00110); (011; 11110)g

Since the number of records stored on a page istypically in the hundreds, the overhead of remember-ing the frame of reference is well worth it: in our exam-ple, if values were originally stored as 32 bit integers,we can compress these points with no loss of informa-tion by (on average) a factor of 4 (without taking intoaccount the overhead of storing the frame of reference)!

dY

dY

dY

dX dX dX

2 bits per dimension = 4 equally spaced values

00 01 10 1100

01

10

11

Figure 1: Frame of reference for lossy compression.

Sometimes, it is su�cient to represent a point orrectangle by a bounding rectangle, e.g., in the indexlevels of an R-tree. In this case, we can reduce the

dX dX dX

dY

dY

dY

00 01 10 1100

01

10

11

actual point

actual point

actual point

estimated point

estimating rectangle

Figure 2: Point approximation in lossy compression.

number of bits required as much as we want by trad-ing o� precision. The idea is that we can use bits torepresent equally spaced numbers within the frame ofreference (see Figure 1), thereby creating a uniform`grid' whose coarseness depends on the number of bitsused to represent `cuts' along each dimension. Eachoriginal point or rectangle is represented by the small-est rectangle in the `grid' that contains it. If the orig-inal data consists of points, these new rectangles arealways of width 1 along any dimension, and we canrepresent such a rectangle by simply using its `min'value along every dimension, e.g., the lower left cornerin two dimensions (see Figure 2). For instance, if 2bits per dimension were used for both the X and Y

axes on S, then S = f(00; 00); (11; 00); (01; 11)g.

2.2 Non-numeric attributes

The page level compression technique, as describedin Section 2.1, applies only to numeric attributes. How-ever, in some situations we can compress non-numericattributes. It is common practice in decision supportsystems (DSS) to identify attributes that have low car-

dinality for special treatment (see [8]). Low cardinalityattributes are attributes that have a very limited rangeof valid values. For example, gender, marital-status,and state/country have very limited ranges althoughvalid values to these attributes are not numeric.

In such systems, it is common practice to map thevalues to a set of consecutive integers, and use thoseintegers as id's for the actual values. The table con-taining the mapping of values to integers is a dimen-

sion table. The fact table, which is the largest tablein the system, contains the integers. We recommendbuilding such dimension tables for attributes with lowand medium cardinality (i.e., up to a few thousandsvalid values). We get good compression on the facttable, and the dimension tables are small enough to �t

in memory or in very few disk pages.

2.3 File level compression

The degree of compression obtained by our page-level compression technique depends greatly on therange of values in each �eld for the set of tuples storedon a page. Thus, the e�ectiveness of the compressioncan be increased, often dramatically, by partitioningthe tuples in a �le across pages in an intelligent way.For instance, if a database contains 500,000 tuples,there are many ways to group these tuples, and dif-ferent groupings may yield drastically di�erent com-pression ratios. [13] demonstrates the e�ectiveness ofusing a B-tree sort order to assign tuples to pages. InSection 3 we further develop the connection betweenindex sort orders, including multidimensional indexeslike R-trees, and improved compression.

In this section we present an algorithm for group-ing tuples into compressed pages. We assume that thetuples are already sorted. The grouping of the tuplesmaintains the given sort order. The algorithm worksas follows:Input: An ordered list of tuples t1; t2; : : : ; tn.

Output: An ordered list of pages containing the com-pressed tuple. The order of the tuples is maintained(i.e., the tuples in the �rst page are ft1; t2; : : : ; tigfor some i; The tuples in the second page arefti+1; ti+2; : : : ; tjg for some j > i, etc..).

Method: This is a greedy algorithm. We �nd maxi-mal i s.t. the set ft1; t2; : : : ; tig �ts (in compressedform) on a page. We put this set in the �rst page.Next, we �nd maximal j s.t. the set fti+1; ti+2; : : : ; tjg�ts on a page. We put this set in the second page.We continue in this way until all tuples are stored inpages.

Note that given the restriction of using our page levelcompression and the order of tuples, this greedy algo-rithm achieves optimal compression.

3 Compressing an indexing struc-

ture

Many indexing structures, including R-tree vari-ants (e.g., [7, 3]), B-trees [2] , grid �les [14], buddytrees [9], TV-trees [12] (using L1 metric), and X-trees [4], all consist of collections of (rectangle,pointer)pairs (for the internal nodes) and (point,data) pairs(forthe leaf nodes). Our main observation is: All theseindexing structures try to group similar objects (n-dimensional points) on the same page. This means

that within a group, the range of values in each di-mension should be much smaller than the range ofvalues for the entire data set (or even the range ofvalues in a random group that �ts on a page). Hence,our compression technique can be used very e�ectivelyon these indexing structures, and is especially usefulwhen the search key contains many dimensions.

While the behavior of our compression techniquewhen used in index structures is similar in some waysto B-tree pre�x compression, our compression schemeis di�erent in that:� We translate the minimum value of our frame ofreference to 0 before compressing.

� Lossy compression makes better use of bits for in-ternal nodes than pre�x compression since all bitcombinations fall inside our frame of reference.

� We also compress leaf level entries, unlike pre�xcompression, which is applied only at non-leaf nodes.Compressing an indexing structure can yield ma-

jor bene�ts in space utilization and in query perfor-mance. In some cases, indexing structures take moredisk space than any other part of the system. (Somecommercial systems store data only in B-trees.) Inthose cases, the space utilization of indexing struc-tures is important. The performance of I/O boundqueries can increase dramatically with compression.If the height of a B-tree is lower, than exact matchqueries have better performance. In all cases, eachpage I/O retrieves more data, thus reducing the costof the query.

Another reason for compression is the quality ofthe indexing structure. Our work in R-trees yieldsthe following result: As dimensionality (number of at-tributes) increases, we need to increase the fan-out ofthe internal nodes to achieve reasonable performance.Compressing index nodes increases the utility of R-trees (and similar structures) by increasing the fan-out.

In Section 3.1 we describe our B-tree compres-sion technique. In Section 3.2 we discuss dynamic andbulk loaded multidimensional indexing structures. Ofparticular interest is our bulk loading algorithm forcompressed rectangle based indexing structures (calledGBPack).

3.1 Compressing a B-tree

The objects stored in the B-tree can be either (key,pointer) pairs, or entire tuples (i.e., (key, data) pairs).The internal nodes of the B-tree contain (key, pointer)

pairs and one extra pointer. In both cases, we cancompress groups of these objects using our page level

compression. (The extra pointer in internal nodes canbe put in the page header. It can also be paired with a\dummy" key that lies inside the frame of reference.)

3.1.1 Dynamic compressed B-trees. Note thatcompressed pages can be updated without consideringother pages. However, updates to entries in a com-pressed page may change the frame of reference. Whenimplementing a dynamic compressed B-tree, we needto observe the following:� When trying to insert an object into a compressedpage, we may need to split the page. In the worstcase, the page may split into three pages. (Details onour algorithm for dynamic changes in the frame ofreference are in [6].) This may happen when the newobject is between the objects on the page (in termsof B-tree order), and the frame of reference changesdramatically because of the new object. (Note thatthis can happen only if the key has multiple at-tributes.) The B-tree insertion algorithm should bemodi�ed to take care of this case.

� We may not be able to merge two neighboring pageseven if the space utilized in these pages amounts toless than one page.

� When deleting entries we can choose to change theframe of reference. However, it is not necessary todo so.

3.1.2 Bulk loading. We sort the items in B-treesort order, after concatenating the key of each itemwith the corresponding pointer or data. We use the�le level compression algorithm on these sorted items(see Section 2.3). The resulting sorted list of pages isthe leaf level of the B-tree.

We create the upper levels of the B-tree, in a bot-tom up order. For each level, we create a list of (key,pointer) pairs that corresponds to the boundaries ofthe pages in the level below it. We compress this listusing the same �le level compression algorithm. Theresulting sorted list of pages is another level of theB-tree.

3.2 Compressing a rectangle based in-

dexing structure

Most multidimensional indexing structures are rect-angle based (e.g., R-trees [7], X-trees [4], TV-tree [12]etc.). They all share these qualities:� These are height-balanced hierarchical structures.

� The objects stored in the indexing structure are ei-ther points or hyper-rectangles in some n-dimensional

space.

� The internal nodes consist of (rectangle, pointer)

pairs. The pointer points to a node one level belowin the tree. The rectangle is a minimum bounding

rectangles (MBR) of all the objects in the subtreepointed to by the pointer.

� All the MBR's are oriented orthogonally with re-spect to the (�xed) axes.

In this section we describe our R-tree compressiontechnique. The discussion is valid for the other rect-angle based indexing structures as well.

3.2.1 Compression of R-tree nodes. Our pagelevel compression technique can be modi�ed for groupsof (rectangle, pointer) pairs. Note that an n-dimensionalrectangle can be viewed as a 2n-dimensional point (i.e.,it is represented by minimum and maximum values foreach dimension). We can use page level compressionon 2n-dimensional points. We can also save some spaceby using an n-dimensional frame of reference and treat-ing each rectangle as two n-dimensional points. Notethat the pointer part of the pair can be treated asanother dimension and be compressed as well.

We make the observation that an R-tree can beused even if the bounding rectangles in the internalnodes are not minimal. In this case, the performance ofthe indexing structure degrades, but the correctness ofsearch and update algorithms remains intact. In somecases, the leaf level of the tree may contain (rectan-

gle, pointer) pairs where the rectangle is a boundingrectangle of some complex object. Again, we may uselarger bounding rectangles (i.e., not minimal). In thiscase, queries may return a superset of the requiredanswers, resulting in some postprocessing. When theminimality of bounding rectangles is not a necessity,we can use lossy compression. There is a tradeo� be-tween degradation of performance due to larger bound-ing rectangles, and the gain in performance due to bet-ter compression.

3.2.2 Dynamic maintenance of the tree. Sinceour compression of pages is independent of other pages,we can update the indexing structure dynamically. Sim-ilar to compressed B-trees, we need to consider changesin frames of reference due to updates (see [6]). In theR-tree case, splitting a node can result in at most twopages (with B-trees it can result in three). The di�er-ence is that the order of objects is not important inan R-tree, so the worst case is realized when the newentry is put in a new page by itself.

3.2.3 GBPack: compression oriented bulk load-

ing for R-trees. All bulk loading algorithms in thispaper partition a set of points or rectangles into pages.The partitioning problem can be described as follows:

Input. A set (or multiset) of points (or rectangles)in some n-dimensional space. We assume that eachdimension (axis) of that space has a linear orderingof values.

Output. A partition of the input into subsets. Thesubsets are usually identi�ed with index nodes ordisk pages.

Requirements. The partition should group points (orrectangles) that are close to each other in the samegroup as much as possible. The partition should alsobe as unbiased as possible with respect to (a set ofspeci�ed) dimensions.

We bulk-load the R-tree by applying the above prob-lem to each level of the R-tree. We do the bulk loadingin a bottom up order. First, the data items (points orrectangles) are partitioned and compressed into pages.Second, we create a set of (rectangle, pointer) pairs,each composed of a bounding rectangle of a leaf pageand a pointer to that page. We apply the above prob-lem to compress this set. We continue this processuntil a level �ts on a compressed page that becomesthe root of the R-tree.

We solve the partition problem by ordering the setof points. Then we apply the packing algorithm (de-scribed in Section 2.3). If we have a set of rectangles,we use the ordering of the center points of the rect-angles, and then apply the packing algorithm to therectangles. The most important part is �nding a goodordering of the points.

First, we'll give an example of the sorting algo-rithm. Then, we'll describe it in detail.

Partition 1 Partition 2 Partition 3 Partition 4

Pages

Figure 3: Example for the bulk loading sort operator(Using GBPack).

The following example in two dimensions demon-strates the GBPack algorithm. Consider the set of 44points shown as small circles in Figure 3. Suppose we

Partition 1 Partition 2 Partition 3 Partition 4

Pages

Figure 4: Example for the bulk loading sort operator(Using STR). Bold points belong to partitions to theright of the point.

determine that the total number of pages needed is15. We sort the set on the X dimension in ascend-ing order. Then we de�ne p := dp15e = 4 as thenumber of partitions along the X dimension; takingthe square root re ects our assumption that the num-ber of partitions along each of the two dimensions isequal (i.e., we expect each of the X-partitions to becut into 4 partitions along the Y dimension). We sortthe �rst partition on the Y dimension in ascendingorder. Then the second partition is sorted in descend-ing order, the third in ascending order and the fourthin descending order. Figure 3 shows the partitions.The arrow in the �gure shows the general ordering ofpoints for that dataset. Note that the linearizationgenerated by the alternation of sort order guaranteesthat all but the last page are fully packed at the ex-pense of a little spatial overlap amongst the leaf pages.In the above description, we assumed that the numberof pages needed was known to be 15. This number wasthen used to determine the number of partitions alongeach axis. In actuality, we don't know the �nal numberof pages needed since it depends on the compressionobtained, which depends on the data. Therefore, weuse an estimate of the number of pages, obtained byassuming that we have the bounding box of all the dataand values in tuples are uniformly distributed over thisrange.

Finally, in the case of low cardinality data, wewant to guarantee that the partition divisions hap-pen along changes in value of the dimension being cut.This results in a much better division of the partitionsinto small ranges when one considers the degeneratecase of low cardinality data. This is a result of nar-rowing the ranges over all the pages in the dataset,since the same value isn't in more than one partition(since the partitions don't overlap). For instance, inFigure 4, note that one of the natural partition divi-sions occurred between two points that had the sameX value. Nonetheless, we did not make the partition

there since it would have reduced compression of ourdataset. See [6] for more details.

Our partitioning technique is similar to STR inthat, starting with the �rst dimension, we divide thedata in the leaf pages into `strips'. For instance, Fig-ure 4 shows how STR decomposes the data space intopages. Since, in STR, we are sorting uncompresseddata, we can calculate exactly the number of pages Pproduced by the bulk loading algorithm. P , in thiscase 15, is used to calculate the number of pages ineach strip (except the last). In this case, we deter-mine that the �rst three strips have four pages, whilethe last has three. Thus, since the �rst three stripscontain four pages each, each strip contains 4*3=12points each. The last strip contains whatever pointsare left (8 in this case). Each of the strips are thengrouped into partitions with 3 entries each, except thevery last page, which contains 1 point. Note that allpages are fully packed except the last.

The important di�erences between our algorithmand STR arise from three considerations:� We are using our bulk loading algorithm to packdata onto compressed pages, and are willing to tradeo� some tree quality for increased compression.

� The degree of compression is based on the data ona given page, and this makes the number of entriesper page data-dependent.

� In the case of low cardinality data, it is very impor-tant that when we cut a dimension, it is done on avalue boundary in the data.The �rst item listed above simply means that we

pack pages more aggressively at the expense of in-creased spatial overlap among the partitions. We dothis by creating a linearization of the data that allowsus to `steal' data from a neighboring partition to �lla partially empty partition. In particular, if one con-siders the above example, there is a partially emptypage at the top of each strip. These pages can be�lled when one considers the e�ect of reversing thesort order of each strip. The results are illustrated byFigure 3. Once this linear ordering is achieved, thedata may be packed onto pages from the beginning ofthe linearization to the end. The second point meansthat we have to estimate the required number of pages.The third point constrains how we determine partitionboundaries.

We now present the GBPack algorithm in more de-tail. The ordering of points is determined by a sortingfunction sort(A; k;D). The arguments are:A: An array of items to be ordered. This is a 2dimensional array of integers where the �rst arrayindex is the tuple number and the second identi�es

a particular dimension (i.e. A[i] is tuple i, A[i][j] isthe value of the jth dimension for the ith tuple). Weuse the notation jAj to refer to the number of tuplesin A.

k: The number of elements in A.

D: An integer identifying the dimension we want tosort on.In addition, there exists the following global vari-

able:S: An array of d booleans where d is the dimension-ality of the data.

If S[d] is true, the current sort order for dimension d isascending, otherwise it is descending. The initial valueof S does not matter.

The function sort(A; k;D) has the following steps:

1. S[D] = not S[D]. Reverse the sort order for everydimension. This step ensures the linearization of thespace illustrated in the example later in this section.

2. Sort the array A on dimension D according to thesorting order S[D].

3. If D 6= 1 do

(a) Estimate P as the number of pages needed forstoring the items in the array. This estimate ismore di�cult since the frame of reference for anindividual page is needed to determine the com-pression ratio. Currently we estimate this numberby retrieving all tuples to partition and using theirframe of reference and the number of tuples to geta very accurate estimate of the number of pages.

(b) Set p := bP 1=Dc. This is the number of parti-tions on dimension D for the array.

(c) De�ne a list L s.t. for 0 � j � d Lj =j�kp. L

is the list of partition start locations.

(d) Lower all values in L s.t. a particular Lj is the�rst occurrence of A[Lj ][D] in A.

(e) Remove any duplicates from the L array. Thisand the previous step guarantee that dimensionsaren't overcut; an important guarantee for low car-dinality �elds.

(f) For 1 � j � jLj � 1 do the following:sort(array starting at A[Lj ]; Lj+1 � Lj ; D � 1).

(g) sort(array starting at A[L[jLj]]; k�L(j); D�1).To perform bulk-loading of an entire tree to maxi-

mize compression, we use the function sort(A; jAj; D).Since the resulting array A is now sorted, we simplypack the pages of the leaf level maximally by introduc-ing entries sequentially from the array until all entriesare packed. This process is repeated for higher levelsin the tree by using the center points of the resulting

pages in the leaf level as input to the next level. Ob-serve that both leaf and internal nodes are compressedin the resulting R-tree index.

4 Performance evaluation

This section is a summary of an extensive perfor-mance evaluation of our techniques. The full details ofthe experiments and results are in [6]. These experi-ments include the compression of many widely varyingdata sets. When we say that we achieved \40% com-pression" we mean that the compressed data set is40% the size of the uncompressed data set. Therefore,lower numbers are better.

Relational compression experiments. We stud-ied both synthetic and real data sets. One real dataset (the companies relation), consisted of ten attributestaken from a the CompuStat stock market data set,which described companies on the stock exchange. Weachieved between 18% compression (for two attributes)to 30% compression (for ten attributes). Another realdata set (the sales relation) consisted of eleven at-tributes taken from a catalog sales company. Eachtuple described a transaction with one customer. Weapplied low cardinality mapping (see Section 2.2) andachieved between 10% and 40% compression. A thirdreal data set was the Tiger GIS data set for Orangecounty, California. We achieved 64% compression withno sorting and 54% with the GBPack algorithm. In allcases, our compression ratios were similar to gzip.

When performing experiments on synthetic datasets we observed the following trends:� The range of values for attributes had a major in u-ence on compression. When the range was small, weneeded few bits for representing a value. Therefore,we got better compression.

� Page size had little e�ect on compression.

� The number of attributes had little e�ect on com-pression. We achieved slightly better compressionfor a lower number of attributes.

� The number of tuples had a logarithmic e�ect oncompression. In general, when the number of tuplesdoubled, the number of bits needed for representinga tuple decreased by one.

� The data distribution had a major e�ect on com-pression. The worst distribution was the uniformdistribution. When the data distribution was highlyskewed (e.g., exponential distribution) most of thevalues on a page were in some small range and com-pression for those values was very high.

� Sorting the tuples increased compression signi�cantly.However, the type of sort order had little in uence.B-tree sort order seemed to achieve slightly bettercompression than GBPack sorting. Therefore, GB-Pack should be used only when constructing a mul-tidimensional indexing structure.

CPU versus I/O costs. We studied the CPU costassociated with decompressing an entire compressedpage. We compared our results to the CPU cost ofgunzip used in a manner consistent with Sybase IQ.The data set used was the CompuStat relation. Ourdecompression technique was faster by a factor of be-tween 8 and 20 (depending on the amount of compres-sion achieved). Of particular importance was that ourcompression technique was fast enough to keep up withsequential I/O on fast disks (we typically achieved 16Mbytes per second for decompressing all contents of apage). gunzip could not keep up with sequential I/O.

Comparison with techniques found in commer-

cial systems. We implemented a compression tech-nique similar to Sybase IQ compression. We used gzipand gunzip on blocks of data. Each block had size of64 Kbytes. Note that Sybase IQ uses a propriety com-pression algorithm that is a variant of the Lempel-Zivalgorithm (gzip is another variant of that algorithm).We expect the Sybase IQ algorithm to be more e�-cient.

On a low cardinality sales data set we achieved 1%compression, while Sybase achieved 10%. On mediumcardinality sales data set we achieved 37% compressionwhile Sybase achieved 23%. On the Compustat dataset (which was a mix of low, medium and high car-dinality) we achieved 30% compression while Sybaseachieved 29%.

Importance of tuple level decompression. Ourdecompression algorithm can be used for decompress-ing single tuples and even single �elds. If we onlyneed a small amount of information from a compressedpage, the CPU throughput increases by orders of mag-nitude.

For example, consider performing an index nestedloops join where the inner relation's index can �t inmemory in compressed form. For each tuple from theouter relation we need to probe the index and decom-press only the relevant tuples from the inner relation.The cost of this join is dominated by the CPU costsince all the index probes are done in memory. Ourdecompression can be two to three orders of magnitudefaster than gunzip in such cases.

Another possible scenario occurs in a multiusersystem, when a relation �ts in memory in compressedform, but does not �t in memory in uncompressedform. In this case it is possible for many users toaccess random tuples from the relation.

R-tree compression experiments. The space re-quired by an R-tree or B-tree is slightly larger than thespace required for the leaf level of the tree. Therefore,the relational compression results hold for R-trees andB-trees as well. In this section we studied the impactof compression on the performance of R-trees.

We created R-trees for the three real data sets de-scribed earlier in this section. We performed pointqueries, partial match queries (the selection criterionspeci�es a range for a subset of the attributes) andrange queries (the selection criterion speci�es a rangefor all attributes). We measured the amount of I/Oneeded for several queries on both compressed and un-compressed R-trees.

For the sales data set, the average I/O cost usingthe compressed R-tree was between 35% and 60% theaverage I/O cost using the uncompressed R-tree. Forthe CompuStat data set it was between 10% and 60%.For the Tiger data set it was between 45% and 70%.

5 Related work

Ng and Ravishankar [13] discussed a compressionscheme that is similar in some respects to our work.In particular, their paper did the following.

� Introduced a page level compression/decompressionalgorithm for relational data.

� Explored the use of a B-tree sort order over thecompressed data as well as using actual B-Trees overthe data.

Note, however, that the details of their compres-sion scheme are quite di�erent.� Our scheme decompresses on a per �eld, per tuplebasis, not a per page basis.

� Except for the actual information in the tuples, westore extra information only on a per page basis.Their compression technique uses run length encod-ing which stores extra information on a per �eld pertuple basis.

� Our compression scheme is easily adapted to belossy, which is important for compressing index pagesin R Trees.Some scenarios that highlight the di�erences be-

tween our schemes are:

� Multiuser workloads that randomly access individ-ual tuples on pages. Clearly, our approach would besuperior in this case.

� Performing a range query using a linear scan overthe data. Since the bounding box for the entire pageis stored on each data page, our scheme can check tosee if the entries on the page need to be examined.

� Performing small probes on a B-tree. Using ourcompression scheme, a binary search can be usedto search on a page, since each record is of �xedlength. Note that this becomes a serious issue in acompressed environment, where many more entriescan �t on a page.

Additionally, we demonstrated the application ofmultidimensional bulk loading to compression, and pre-sented a range of performance results that strongly ar-gue for the use of compression in a database context.

[5, 16, 1] discuss several compression techniquessuch as run length encoding, header compression, en-coding category values, order preserving compression,Hu�man encoding, Lempel-Ziv, di�erencing, pre�x andpost�x compression, none of which support randomaccess to tuples within a page. The above techniques,unlike ours, handle any kind of data, but introducebu�er and storage management problems.

[15] discusses several query evaluation algorithmsbased on the use of compression. While this paper as-sumes gzip compression is used, our techniques couldbe used as well in most of the examples discussed there.

6 Conclusions and future work

This paper presents a new compression algorithmand demonstrates its e�ectiveness on relational databasepages. Compression ratios of between 3 and 4 to 1seemed typical on real datasets. Low cardinality datasetsin particular produced compression ratios as high as88 to 1. Decompression costs are surprisingly low. Infact, the CPU cost of decompressing a relation wasapproximately 1=10 the CPU cost of gunzip over thesame relation, while the achieved compression ratioswere comparable. This di�erence in CPU costs meansthat the CPU decompression cost becomes much lessthan sequential I/O cost, whereas it was earlier higherthan sequential I/O cost. Further, if only a single tu-ple is required, just that tuple (or even �eld) can bedecompressed at orders of magnitude lower cost thandecompressing the entire page. This makes it feasi-ble to store pages in the bu�er pool in compressedform; when a tuple on the page is required, it canbe extracted very fast. To our knowledge no other

compression algorithm allows decompression on a per-tuple basis.

The compression code is localized in the code thatmanages tuples on individual pages, making it easyto integrate it into an existing DBMS. This, togetherwith the simpli�cations it o�ers in keeping compressedpages in the bu�er pool (also leading to much betterutilization of the bu�er pool), makes it attractive froman implementation standpoint. A related point is thatby applying it to index pages that contain hkey; ridipairs, we can obtain the bene�ts of techniques for stor-ing hkey; rid� listi pairs with specialized rid represen-tations that exploit \runs" of rids.

In comparison to techniques like gzip compres-sion, our algorithm has the disadvantage that it com-presses only numeric �elds (low cardinality �elds ofother types can be mapped into numeric �elds, and infact, this is often done anyway since it also improvesthe compression attained by gzip). Note, however,that it can be applied to �les containing a combinationof numeric and non-numeric �elds: it will then achievecompression on just the numeric �elds. Nonetheless,the range of applicability is quite broad; as an exam-ple, fact tables in data warehouses, which contain thebulk of warehoused data, contain many numeric andlow-cardinality �elds, and no long text �elds.

We also explored the relationship between sort-ing and compressibility in detail. Among the sortsexplored were sorts suitable for bulk loading multidi-mensional indexing structures and B-trees. The im-portant conclusion is that both sorts worked equallywell|which implies that compression will work wellon both linear and multidimensional indexes|and aresigni�cantly better than no sort. The latter observa-tion underscores the importance of sorting data priorto compression, if it is not already at least approxi-mately sorted.

7 Acknowledgements

We would like to give a warm acknowledgementto Kevin Beyer, for his time as a sounding board, andasking such questions as \Can you support primaryindexes?" We would also like to thank Clark Frenchand Sybase for providing us with the details concerningtheir compression techniques. In addition, we wouldlike to thank Patrick O'Neil for valuable feedback; inaddressing his questions, we signi�cantly strengthenedthis paper.

References

[1] M. A. Bassiouni, \Data Compression in Scienti�c and

Statistical Databases", in IEEE Transactions on Soft-

ware Engineering, Vol. SE-11, No. 10, pp. 1047-1058,

1985.

[2] R. Bayer and E. McCreight, \Organization and main-

tenance of large ordered indexes", in Acta Informat.,

Vol. 1, pp. 173-189, 1972.

[3] N. Beckmann, H.-P. Kriegel, R. Schneider and B.

Seeger, \The R�

-Tree: An E�cient and Robust Ac-

cess Method for Points and Rectangles", in ACM SIG-

MOD 1992 pp. 322-331.

[4] S. Berchtold, D. A. Keim and H.-P. Kriegel, \The

X-Tree: An Index Structure for High Dimensional

Data", in VLDB 1996, pp. 28-39.

[5] S. J. Eggers, F. Olken and A. Shoshani, \A Com-

pression Technique for Large Statistical Databases",

in VLDB 1981, pp. 424-434.

[6] J. Goldstein, R. Ramakrishnan and U. Shaft, \Com-

pressing Relations and Indexes", Technical report no.

1366, CS Dept., University of Wisconsin-Madison, De-

cember 1997.

[7] Antonin Guttman, \R-Trees: A Dynamic Index Struc-

ture for Spatial Searching", in ACM SIGMOD 1984,

pp. 47-57.

[8] R. Kimball, The Data Warehouse Toolkit, John Wiley

and Sons, 1996.

[9] H.-P. Kriegel et al, \The Buddy-Tree: An E�cient

and Robust Method for Spatial Data Base Systems",

in VLDB 1990, pp. 590-601.

[10] A. Lempel and J. Ziv, \A Universal Algorithm for

Sequential Data Compression", in IEEE Transactions

on Information Theory, Vol. 31, No. 3, pp. 337-343,

1977.

[11] S. T. Leutenegger et al, \STR: A Simple and E�cient

Algorithm for R-Tree Packing", Tech. Report, Math-

ematics and Computer Science Dept., University of

Denver, No. 96-02, 1996.

[12] K.-I. Lin, H. V. Jagadish and C. Faloutsos, \The

TV-Tree: An Index Structure for High-Dimensional

Data", in VLDB journal, Vol. 3, No. 4, pp. 517-542,

1994.

[13] W. K. Ng, C. V. Ravishankar, \Relational Database

Compression Using Augmented Vector Quantization",

in IEEE 11'th International Conference on Data En-

gineering, pp. 540-549, 1995.

[14] J. Nievergelt, H. Hinterberger and S. C. Sevcik, \The

Grid File: An Adaptable, Symmetric Multikey File

Structure", Readings in Database Systems, Morgan

Kaufmann, 1988.

[15] P. O'Neil and D. Quass, \Improved Query Perfor-

mance with Variant Indexes", inACM SIGMOD 1997,

pp. 38-49.

[16] M. A. Roth and S. J. Van Horn, \Database Compres-

sion", in SIGMOD Record, Vol. 22, No. 3, pp. 31-39,

1993.

[ieee comput. soc 14th international conference on data engineering - orlando, fl, usa (23-27 feb....

Documents