the data calculator : data structure design and cost ... · the data calculator∗: data structure...

The Data Calculator !: Data Structure Design and Cost Synthesisfrom First Principles and Learned Cost Models

Stratos Idreos, Kostas Zoumpatianos, Brian Hentschel, Michael S. Kester, Demi GuoHarvard University

ABSTRACTData structures are critical in any data-driven scenario, but theyare notoriously hard to design due to a massive design space andthe dependence of performance on workload and hardware whichevolve continuously. We present a design engine, the Data Cal-culator, which enables interactive and semi-automated design ofdata structures. It brings two innovations. First, it o!ers a set of"ne-grained design primitives that capture the "rst principles ofdata layout design: how data structure nodes lay data out, andhow they are positioned relative to each other. This allows for astructured description of the universe of possible data structuredesigns that can be synthesized as combinations of those primi-tives. The second innovation is computation of performance usinglearned cost models. These models are trained on diverse hardwareand data pro"les and capture the cost properties of fundamentaldata access primitives (e.g., random access). With these models, wesynthesize the performance cost of complex operations on arbi-trary data structure designs without having to: 1) implement thedata structure, 2) run the workload, or even 3) access the targethardware. We demonstrate that the Data Calculator can assist datastructure designers and researchers by accurately answering richwhat-if design questions on the order of a few seconds or minutes,i.e., computing how the performance (response time) of a givendata structure design is impacted by variations in the: 1) design, 2)hardware, 3) data, and 4) query workloads. This makes it e!ortlessto test numerous designs and ideas before embarking on lengthyimplementation, deployment, and hardware acquisition steps. Wealso demonstrate that the Data Calculator can synthesize entirelynew designs, auto-complete partial designs, and detect suboptimaldesign choices.

Let us calculate.ÑGottfried Leibniz

ACM Reference Format:Stratos Idreos, Kostas Zoumpatianos, Brian Hentschel, Michael S. Kester,Demi Guo. 2018. The Data Calculator:Data Structure Design and CostSynthesis from First Principles and Learned Cost Models. In Proceedings of2018 International Conference on Management of Data (SIGMODÕ18).ACM,New York, NY, USA, 16 pages. https://doi.org/10.1145/3183713.3199671

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor pro"t or commercial advantage and that copies bear this notice and the full citationon the "rst page. Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credit is permitted. To copy otherwise, or republish,to post on servers or to redistribute to lists, requires prior speci"c permission and/or afee. Request permissions from [email protected]Õ18, June 10—15, 2018, Houston, TX, USA© 2018 Association for Computing Machinery.ACM ISBN 978-1-4503-4703-7/18/06. . .$15.00https://doi.org/10.1145/3183713.3199671

Key partitioning

RandomAccess

SerialScan

Zone-mapÞlters

Key order

Fanout

Immediatenode links

BinarySearch

data layout primitives

po

ssib

le d

esign

primitive combinatio

ns

data access primitives

Data Structures

New designHash

Table

design primitive combinations

design & cost synthe

sis

TrieB-Tree

Figure 1: The concept of the Data Calculator: computingdata access method designs as combinations of a small setof primitives. (Drawing inspired by a �gure in the Ph.D. the-sis of Gottfried Leibniz who envisioned an engine that cal-culates physical laws from a small set of primitives [52].)

1 FROMMANUAL TO INTERACTIVE DESIGNThe Importance of Data Structures. Data structures are at thecore of any data-driven software, from relational database sys-tems, NoSQL key-value stores, operating systems, compilers, HCIsystems, and scienti"c data management to any ad-hoc programthat deals with increasingly growing data. Any operation in anydata-driven system/program goes through a data structure when-ever it touches data. Any e!ort to rethink the design of a speci"csystem or to add new functionality typically includes (or evenbegins by) rethinking how data should be stored and accessed[1, 9, 33, 38, 51, 75, 76]. In this way, the design of data structureshas been an active area of research since the onset of computerscience and there is an ever-growing need for alternative designs.This is fueled by 1) the continuous advent of new applications thatrequire tailored storage and access patterns in both industry andscience, and 2) new hardware that requires speci"c storage andaccess patterns to ensure longevity and maximum utilization. Everyyear dozens of new data structure designs are published, e.g., morethan "fty new designs appeared at ACM SIGMOD, PVLDB, EDBTand IEEE ICDE in 2017 according to data from DBLP.A Vast and Complex Design Space. A data structure design con-sists of 1) a data layout to describe how data is stored, and 2) al-gorithms that describe how its basic functionality (search, insert,etc.) is achieved over the speci"c data layout. A data structure canbe as simple as an array or arbitrarily complex using sophisticatedcombinations of hashing, range and radix partitioning, careful dataplacement, compression and encodings. Data structures may alsobe referred to as Òdata containersÓ or Òaccess methodsÓ (in whichcase the term ÒstructureÓ applies only to the layout). The data layout

∗The name ÒCalculatorÓ pays tribute to the early works that experimented with theconcept of calculating complex objects from a small set of primitives [52].

design itself may be further broken down into the base data layoutand the indexing information which helps navigate the data, i.e., theleaves of a B+tree and its inner nodes, or buckets of a hash table andthe hash-map. We use the term data structure design throughoutthe paper to refer to the overall design of the data layout, indexing,and the algorithms together as a whole.

We de"ne ÒdesignÓ as the set of all decisions that characterizethe layout and algorithms of a data structure, e.g., ÒShould datanodes be sorted?Ó, ÒShould they use pointers?Ó, and ÒHow shouldwe scan them exactly?Ó. The number of possiblevalid data structuredesigns explodes to! 1032 even if we limit the overall design toonly two di!erent kinds of nodes (e.g., as is the case for B+trees). Ifwe allow every node to adopt di!erent design decisions (e.g., basedon access patterns), then the number of designs grows to! 10100.1

We explain how we derive these numbers in Section 2.The Problem: Human-Driven Design Only. The design of datastructures is a slow process, relying on the expertise and intuition ofresearchers and engineers who need to mentally navigate the vastdesign space. For example, consider the following design questions.

(1) We need a data structure for a speci"c workload: Should westrip down an existing complex data structure? Should webuild o! a simpler one? Or should we design and build a newone from scratch?

(2) We expect that the workload might shift (e.g., due to newapplication features): How will performance change? Shouldwe redesign our core data structures?

(3) We add #ash drives with more bandwidth and also add moresystem memory: Should we change the layout of our B-treenodes? Should we change the size ratio in our LSM-tree?

(4) We want to improve throughput: How bene"cial would itbe to buy faster disks? more memory? or should we investthe same budget in redesigning our core data structure?

This complexity leads to a slow design process and has severecost side-e!ects [12, 22]. Time to market is of extreme importance,so new data structure design e!ectively stops when a design ÒisdueÓ and only rarely when it Òis readyÓ. Thus, the process of designextends beyond the initial design phase to periods of reconsideringthe design given bugs or changes in the scenarios it should support.Furthermore, this complexity makes it di$cult to predict the impactof design choices, workloads, and hardware on performance. Weinclude two quotes from a systems architect with more than twodecades of experience with relational systems and key-value stores.

(1) ÒI know from experience that getting a new data structure intoproduction takes years. Over several years, assumptions made aboutthe workload and hardware are likely to change, and these changesthreaten to reduce the bene!t of a data structure. This risk of changemakes it hard to commit to multi-year development e"orts. We needto reduce the time it takes to get new data structures into production.Ó

(2) ÒAnother problem is the limited ability we have to iterate. Whilesome changes only require an online schema change, many requirea dump and reload for a data service that might be running 24x7.The budget for such changes is limited. We can overcome the limitedbudget with tools that help us determine the changes most likely to beuseful. Decisions today are frequently based on expert opinions, andthese experts are in short supply.Ó

1For comparison, the estimated number of stars in the universe is1024.

Vision Step 1: Design Synthesis from First Principles. We pro-pose a move toward the new design paradigm captured in Figure 1.Our intuition is that most designs (and even inventions) are aboutcombining a small set of fundamental concepts in di!erent waysor tunings. If we can describe the set of the "rst principles of datastructure design, i.e., the core design principles out of which alldata structures can be drawn, then we will have a structured wayto express all possible designs we may invent, study, and employ ascombinations of those principles. An analogy is the periodic table ofelements in chemistry. It classi"es elements based on their atomicnumber, electron con"guration, and recurring chemical properties.The structure of the table allows one to understand the elementsand how they relate to each other but crucially it also enables argu-ing about the possible design space; more than one hundred yearssince the inception of the periodic table in the 18th century, wekeep discovering elements that are predicted (synthesized) by theÒgapsÓ in the table, accelerating science.Our vision is to build the periodic table of data structures sowe can express their massive design space. We take the "rst step inthis paper, presenting a set of "rst principles that can synthesizeorders of magnitude more data structure designs than what hasbeen published in the literature. It captures basic hardware con-scious layouts and read operations; future work includes extendingthe table for additional parts of the design space, such as updates,concurrency, compression, adaptivity, and security.Vision Step 2: Cost Synthesis from Learned Models. The sec-ond step in our vision is to accelerate and automate the designprocess. Key here, is being able to argue about the performancebehavior of the massive number of designs so we can rank them.Even with an intuition that a given design is an excellent choice,one has to implement the design, and test it on a given data andquery workload and onto speci"c hardware. This process can takeweeks at a time and has to be repeated when any part of the envi-ronment changes. Can we accelerate this process so we can quicklytest alternative designs (or di!erent combinations of hardware, data,and queries) on the order of a few seconds? If this is possible, thenwe can 1) accelerate design and research of new data structures,and 2) enable new kinds of adaptive systems that can decide coreparts of their design, and the right hardware.

Arguing formally about the performance of diverse designs is anotoriously hard problem [13, 58, 72, 75, 77, 78] especially as work-load and hardware properties change; even if we can come up witha robust analytical model it may soon be obsolete [43]. We takea hybrid route using a combination of analytical models, bench-marks, and machine learning for a small set of fundamental accessprimitives. For example, all pointer based data structures need toperform random accesses as operations traverse their nodes. Alldata structures need to perform a write during an update operation,regardless of the exact update strategy. We synthesize the cost ofcomplex operations out of models that describe those simpler morefundamental operations inspired by past work on generalized mod-els [58, 75]. In addition, our models start out as analytical modelssince we know how these primitives will likely behave. However,they are also trained across diverse hardware pro"les by runningbenchmarks that isolate the behavior of those primitives. This way,we learn a set of coe$cients for each model that capture the subtleperformance details of diverse hardware settings.

What If

Auto-completion

Locate Bad Design

Des

ign

Que

stio

ns

?

What If

Auto-completion

Locate Bad Design

?

next choices

- Structure Layout Spec.- Data & Query Workload- Hardware ProÞle

- Latency - Full Design: Layout & Access (AST)

Hardware ProÞles

Serial Scan

Sorted Search

Random Probe

Equality Scan

Range Scan

ÉÉ

Sec3: Data Access PrimitivesSec2: Data Layout Primitives

Data Node (Element) Library

B-TreeInternal Data Page

B+TreeInternal

Linked List Trie Skip List

ArrayÉ

Micro-benchmarks train models on di ! erent hardware proÞles.

Machine Learning

f (x)!

i

ci1+e! ki ( x! xi )

f (x) = ax + b

É

ElementGenerator

Zone MapsPa

rtitio

ning Bloom Filter

46 primitives

Func

. 1 -

Func

. 2 -

True -False -

Min -

Max -

Both -

É

parallelization

Operation SynthesisOperation Synthesis (Level 1) Hardware Conscious Synthesis (Level 2)

Cost Synthesis with Learned Models

Binary Search

É

Equality Scan

Binary Search

Range Scan

Mic

ro-b

ench

mar

ks

??

É

Leve

l 1Le

vel 2

Level 1 toLevel 2translation

Get Range

É

Bulk Load Get Range

É

Bulk Load

Sec4: What-if Design

CombinationValidity Rules

best design so far

Auto-complete

performance

Pruning

Memoization

Rules

iterative search

next node/cost evaluation

Figure 2: The architecture of the Data Calculator: From high-level layout speci�cations to performance cost calculation.

The Data Calculator: Automated What-if Design. We presenta Òdesign engineÓ — the Data Calculator — that can compute theperformance of arbitrary data structure designs as combinations offundamental design primitives.It is an interactive tool that acceler-ates the process of design by turning it into an exploration process,improving the productivity of researchers and engineers; it is able toanswer what-if data structure design questions to understand howthe introduction of new design choices, workloads, and hardwarea!ect the performance (latency) of an existing design. It currentlysupports read queries for basic hardware conscious layouts. It al-lows users to give as input a high-level speci"cation of the layoutof a data structure (as a combination of primitives), in addition toworkload, and hardware speci"cations.The Data Calculator givesas output a calculation of the latency to run the input workloadon the input hardware.The architecture and components of theData Calculator are captured in Figure 2 (from left to right): (1) alibrary of "ne-grained data layout primitives that can be combinedin arbitrary ways to describe data structure layouts; (2) a library ofdata access primitives that can be combined to generate designs ofoperations; (3) an operation and cost synthesizer that computes thedesign of operations and their latency for a given data structurelayout speci"cation, a workload and a hardware pro"le, and (4) asearch component that can traverse part of the design space to sup-plement a partial data structure speci"cation or inspect an existingone with respect to both the layout and the access design choices.Inspiration. Our work is inspired by several lines of work acrossmany "elds of computer science. John OusterhoutÕs project Magicin the area of computer architecture allows for quick veri"cation oftransistor designs so that engineers can easily test multiple designs[62]. Leland WilkinsonÕs Ògrammar of graphicsÓ provides structureand formulation on the massive universe of possible graphics onecan design [74]. Mike FranklinÕs Ph.D. thesis explores the possibleclient-server architecture designs using caching based replication asthe main design primitive and proposes a taxonomy that producedboth published and unpublished (at the time) cache consistencyalgorithms. Joe HellersteinÕs work on Generalized Search Indexes[6, 7, 38, 47—50] makes it easy to design and test new data structuresby providing templates that signi"cantly minimize implementationtime. S. Bing YaoÕs work on generalized cost models [75] for data-base organizations, and Stefan ManegoldÕs work on generalizedcost models tailored for the memory hierarchy [57] showed that it

is possible to synthesize the costs of database operations from basicaccess patterns and based on hardware performance properties.Work on data representation synthesis in programming languages[15, 18—21, 24—27] enables selection and synthesis of representa-tions out of small sets of (3-5) existing data structures. The DataCalculator can be seen as a step toward the Automatic Programmerchallenge set by Jim Gray in his Turing award lecture [35], andas a step toward the Òcalculus of data structuresÓ challenge set byTuring award winner Robert Tarjan [71]: ÒWhat makes one datastructure better than another for a certain application? The knownresults cry out for an underlying theory to explain them.Ó

Contributions. Our contributions are as follows:(1) We introduce a set of data layout design primitives that cap-

ture the "rst principles of data layouts including hardwareconscious designs that dictate the relative positioning of datastructure nodes (⁄2).

(2) We show how combinations of the design primitives candescribe known data structure designs, including arrays,linked-lists, skip-lists, queues, hash-tables, binary trees and(Cache-conscious) b-trees, tries, MassTree, and FAST (⁄2).

(3) We show that in addition to known designs, the design prim-itives form a massive space of possible designs that has onlybeen minimally explored in the literature (⁄2).

(4) We show how to synthesize the latency cost of basic opera-tions (point and range queries, and bulk loading) of arbitrarydata structure designs from a small set of access primitives.Access primitives represent fundamental ways to access dataand come with learned cost models which are trained ondiverse hardware to capture hardware properties (⁄3).

(5) We show how to use cost synthesis to interactively answercomplex what-if design questions, i.e., the impact of changesto design, workload, and hardware (⁄4).

(6) We introduce a design synthesis algorithm that completespartial layout speci"cations given a workload and hardwareinput; it utilizes cost synthesis to rank designs (⁄4).

(7) We demonstrate that the Data Calculator can accuratelycompute the performance impact of design choices for state-of-the-art designs and diverse hardware (⁄5).

(8) We demonstrate that the Data Calculator can accelerate thedesign process by answering rich design questions in a mat-ter of seconds or minutes (⁄5).

2 DATA LAYOUT PRIMITIVESAND STRUCTURE SPECIFICATIONS

In this section, we discuss the library of data layout design prim-itives and how it enables the description of a massive number ofboth known and unknown data structures.Data Layout Primitives. The Data Calculator contains a smallset of design primitives that represent fundamental design choiceswhen constructing a data structure layout. Each primitive belongsto a class of primitives depending on the high-level design concept itrefers to such as node data organization, partitioning, node physicalplacement, and node metadata management. Within each class,individual primitives de"ne design choices and allow for alternativetunings. The complete set of primitives we introduce in this paperis shown in Figure 11 in the appendix; they describe basic datalayouts and cache conscious optimizations for reads. For example,ÒKey Order (none|sorted|k-ary)Ó de"nes how data is laid out in anode. Similarly, ÒKey Retention (none|full|func)Ó de"nes whetherand how keys are included in a node. In this way, in a B+tree allnodes use ÒsortedÓ for order maintenance, while internal nodes useÒnoneÓ for key retention as they only store fences and pointers, andleaf nodes use ÒfullÓ for key retention.

The logic we use to generate primitives is that each one shouldrepresent a fundamental design concept that does not break downinto more useful design choices (otherwise, there will be parts ofthe design space we cannot express). Coming up with the set ofprimitives is a trial and error task to map the known space of designconcepts to an as clean and elegant set of primitives as possible.

Naturally, not all layout primitives can be combined. Most invalidrelationships stem from the structure of the primitives, i.e., eachprimitive combines with every other standalone primitive. Only afew pairs of primitive tunings do not combine which generates asmall set of invalidation rules. These are mentioned in Figure 11.From Layout Primitives to Data Structures. To describe com-plete data structures, we introduce the concept ofelements. Anelement is a full speci"cation of a single data structure node; itde"nes the data and access methods used to access the nodeÕs data.An element may be ÒterminalÓ or Ònon-terminalÓ. That is, an el-ement may be describing a node that further partitions data tomore nodes or not. This is done with the ÒfanoutÓ primitive whosevalue represents the maximum number of children that would begenerated when a node partitions data. Or it can be set to ÒterminalÓin which case its value represents the capacity of a terminal node.A data structure speci"cation contains one or more elements. Itneeds to have at least one terminal element, and it may have zeroor more non-terminal elements. Each element has a destinationelement (except terminal ones) and a source element (except theroot). Recursive connections are allowed to the same element.Examples. A visualization of the primitives can be seen at the leftside of Figure 3. It is a #at representation of the primitives shown inFigure 11 which creates an entry for every primitive signature.Theradius depicts the domain of each primitive but di!erent primitivesmay have di!erent domains, visually depicted via the multiple innercircles in the radar plots of Figure 3.The small radar plots on theright side of Figure 3 depict descriptions of nodes of known datastructures as combinations of the base primitives. Even visuallyit starts to become apparent that state-of-the-art designs which

are meant to handle di!erent scenarios are Òsynthesized from thesame pool of design conceptsÓ. For example, using the non-terminalB+tree element and the terminal sorted data page element we canconstruct a full B+tree speci"cation; data is recursively brokendown into internal nodes using the B+tree element until we reachthe leaf level, i.e., when partitions reach the terminal node size.Figure 3 also depicts Trie and Skip-list speci"cations. Figure 11provides complete speci"cations of Hash-table, Linked-list, B+tree,Cache-conscious B-tree, and FAST.Elements “Without Data”. For #at data structures without anindexing layer, e.g., linked-lists and skip-lists, there need to beelements in the speci"cation that describe the algorithm used tonavigate the terminal nodes. Given that this algorithm is e!ectivelya model, it does not rely on any data, and so such elements do nottranslate to actual nodes; they only a!ect algorithms that navigateacross the terminal nodes. For example, a linked-list element inFigure 11 describes that data is divided into nodes that can only beaccessed via following the links that connect terminal nodes. Simi-larly, one can create complex hierarchies of non-terminal elementsthat do not store any data but instead their job is to synthesizea collective model of how the keys should be distributed in thedata structure, e.g., based on their value or other properties of theworkload. These elements may lead to multiple hierarchies of bothnon-terminal nodes with data and terminal ones, synthesizing datastructure designs that treat parts of the data di!erently. We will seesuch examples in the experimental analysis.Recursive Design Through Blocks. A block is a logical portionof the data that we divide into smaller blocks to construct an in-stance of a data structure speci"cation. The elements in a speci-"cation are the ÒatomsÓ with which we construct data structureinstances by applying them recursively onto blocks. Initially, thereis a single block of data, all data. Once all elements have been ap-plied, the original block is broken down into a set of smaller blocksthat correspond to the internal nodes (if any) and the terminalnodes of the data structure. Elements without data can be thoughtof as if they apply on a logical data block that represents part ofthe data with a set of speci"c properties (i.e., all data if this is the"rst element) and partitions the data with a particular logic intofurther logical blocks or physical nodes. This recursive constructionis used when we test, cost, and search through multiple possibledesigns concurrently over the same data for a given workload andhardware as we will discuss in the next two sections, but it is alsohelpful to visualize designs as if Òdata is pushed through the designÓbased on the elements and logical blocks.Cache-Conscious Designs. One critical aspect of data structuredesign is the relative positioning of its nodes, i.e., how ÒfarÓ eachnode is positioned with respect to its predecessors and successors ina query path. This aspect is critical to the overall cost of traversinga data structure. The Data Calculator design space allows to dictatehow nodes should be positioned explicitly: each non-terminal ele-ment de"nes how its children are positioned physically with respectto each other and with respect to the current node. For example,setting the layout primitive ÒSub-block physical layoutÓ to BFS tellsthe current node that its children are laid out sequentially. In ad-dition, setting the layout primitive ÒSub-blocks homogeneousÓ totrue implies that all its children have the same layout (and therefore"xed width), and allows a parent node to access any of its children

bloc

kAcc

ess.

dire

ctbl

ockA

cces

s.he

adLi

nk

bloc

kAcc

ess.

tailL

ink

bloo

m.a

ctiv

e

bloom

.has

hFun

ctio

nsNu

m

bloom

.numb

erOfB

its

capaci

ty.fun

ction

capacit

y.type

capacity.va

lue

external.links.next

external.links.prevfanout.fixedValuefanout.functionfanout.type

filters.filtersMemLayout

filters.zoneMaps.exact

filters.zoneMaps.m

ax

filters.zoneMaps.m

in

indirectedPointers

keyRetention.function

keyRetention.type

links.linksMem

Layout

links

.nex

t

links

.pre

v

links

.ski

pLin

ks.p

roba

bilit

y

links

.ski

pLin

ks.ty

pe

log.

filter

sPer

Leve

l

log.fil

tersP

erRun

log.ini

tialRu

nSize

log.max

RunsPe

rLevel

log.mergeF

actororderMaint

enance.type

partitioning.function

partitioning.type

recursionAllowed

retainedDataLayout

sub! block.phys.homog.

sub! block.phys.layout

sub! block.phys.location

utilization.constraint

utilization.functionvalueR

etention.functionvalueR

etention.type

zeroElem

entNullable

Node layout primitives

SKIP LIST(NON! TERMINAL ELEMENT)

TRIE ELEMENT(NON! TERMINAL ELEMENT)

LSM ELEMENT(NON! TERMINAL ELEMENT)

LSM DATAPAGE(TERMINAL ELEMENT)

RANGE PARTITIONING(NON! TERMINAL ELEMENT)

HASH PARTITIONING(NON! TERMINAL ELEMENT)

B+TREE ELEMENT(NON! TERMINAL ELEMENT)

B! TREE ELEMENT(NON! TERMINAL ELEMENT)

UNSORTED DATAPAGE(TERMINAL ELEMENT)

SORTED DATAPAGE(TERMINAL ELEMENT)

COMPRESSED DATAPAGE(TERMINAL ELEMENT)

LINKED LIST(NON! TERMINAL ELEMENT)

SORTED DATA PAGEELEMENT

TRIEELEMENT

Trie

B+Tre

e

SKIP LISTELEMENT

Skip

List

B+TREE ELEMENT

Data layout primitives

Figure 3: The data layout primitives and examples of synthesizing node layouts of state-of-the-art data structures.

nodes directly with a single pointer and reference number. This, inturn, makes it possible to "t more data in internal nodes becauseonly one pointer is needed and thus more fences can be storedwithin the same storage budget. Such primitives allow specifyingdesigns such as Cache Conscious B+tree [67] (Figure 11 providesthe complete speci"cation), but also the possibility of generalizingthe optimizations made there to arbitrary structures.

Similarly, we can describe FAST [44]. First, we set ÒSub-blockphysical locationÓ to inline, specifying that the children nodes aredirectly after the parent node physically. Second, we set that the chil-dren nodes are homogeneous, and "nally, we set that the childrenhave a sub-block layout of ÒBFS Layer List (Page Size / Cache LineSize)Ó. Here, the BFS layer list speci"es that on a higher level, weshould have a BFS layout of sub-trees containing Page Size/CacheLine Size layers; however, inside of those sub-trees pages are laidout in BFS manner by a single level. The combination matches thecombined Page Level blocking and Cache Line level blocking ofFAST. Additionally, the Data Calculator realizes that all child nodephysical locations can be calculated via o!sets, and so eliminatesall pointers. Figure 11 provides the complete speci"cation.Size of the Design Space. To help with arguing about the possibledesign space we provide formal de"nitions of the various constructs.

De!nition 2.1 (Data Layout Primitive).A primitive pi belongsto a domain of valuesPi and describes a layout aspect of a datastructure node.

De!nition 2.2 (Data Structure Element).A Data Structure ElementE is de"ned as a set of data layout primitives:E = {p1, ...,pn } "Pi # ... # Pn , that uniquely identify it.

Given a set ofIn! (P ) invalid combinations, the set of all possibleelementsE, (i.e., node layouts) that can be designed as distinctcombinations of data layout primitives has the following cardinality.

|E | = Pi # ... # Pn $ In! (P ) =!

%Pi " E|Pi | $ In! (P ) (1)

De!nition 2.3 (Blocks).Each non-terminal elementE " E , ap-plied on a set of data entriesD " D , uses functionBE (D) ={D1, ...,Df } to divideD into f blocks such thatD1 & ... & Df = D.

A polymorphic design where every block may be described by adi!erent element leads to the following recursive formula for thecardinality of all possible designs.

cpol! (D) = |E |+"

%E "E

"

%Di " BE (D )cpol! (Di ) (2)

Example: AVast Space of DesignOpportunities. To get insightinto the possible total designs we make a few simplifying assump-tions. Assume the same fanoutf across all nodes and terminal nodesize equal to page sizepsize . ThenN = ’

|D |psize ( is the total number

of pages in which we can divide the data andh = ’ lo" f (N )( isthe height of the hierarchy. We can then approximate the result ofEquation 2 by considering that we have|E| possibilities for the rootelement, andf ) |E| possibilities for its resulting partitions whichin turn have f ) |E| possibilities each up to the maximum level ofrecursionh = lo" f (N ). This leads to the following result.

cpol! (D) * |E | ) (f ) |E |)’ lo" f (N )( (3)

Most sophisticated data structure designs use only two distinctelements, each one describing all nodes across groups of levels ofthe structure, e.g., B-tree designs use one element for all internalnodes and one for all leaves. This gives the following design spacefor most standard designs.

cstan (D) * |E |2 (4)

Using Equations 1, 3 and 4 we can get estimations of the possi-ble design space for di!erent kinds of data structure designs. Forexample, given the existing library of data layout primitives, andby limiting the domain of each primitive as shown in Figure 11 inappendix, then from Equation 1 we get|E | = 1016, meaning we candescribe data structure layouts from a design space of1016 possiblenode elements and their combinations.This number includes only

valid combinations of layout primitives, i.e., all invalid combina-tions as de"ned by the rules in Figure 11 are excluded.Thus, wehave a design space of1032 for standard two-element structures(e.g., where B-tree and Trie belong) and1048 for three-elementstructures (e.g., where MassTree [59] and Bounded-Disorder [55]belong). For polymorphic structures, the number of possible designsgrows more quickly, and it also depends on the size of the trainingdata used to "nd a speci"cation, e.g., it is> 10100 for 1015 keys.

The numbers in the above example highlight that data structuredesign is still a wide-open space with numerous opportunities forinnovative designs as data keeps growing, application workloadskeep changing, and hardware keeps evolving.Even with hundredsof new data structures manually designed and published each year,this is a slow pace to test all possible designs and to be able to argueabout how the numerous designs compare. The Data Calculator isa tool that accelerates this process by 1) providing guidance aboutwhat is the possible design space, and 2) allowing to quickly test howa given design "ts a workload and hardware setting.A technicalreport includes a more detailed description of its primitives [23].

3 DATA ACCESS PRIMITIVESAND COST SYNTHESIS

We now discuss how the Data Calculator computes the cost (la-tency) of running a given workload on a given hardware for aparticular data structure speci"cation. Traditional cost analysis insystems and data structures happens through experiments and thedevelopment of analytical cost models. Both options are not scalablewhen we want to quickly test multiple di!erent parts of the mas-sive design space we de"ne in this paper. They require signi"cantexpertise and time, while they are also sensitive to hardware andworkload properties. Our intuition is that we can instead synthesizecomplex operations from their fundamental components as we dofor data layouts in the previous section, and then develop a hybridway (through both benchmarks and models but without signi"canthuman e!ort needed) to assign costs to each component individu-ally; The main idea is that we learn a small set of cost models for"ne-grained data access patterns out of which we can synthesizethe cost of complex dictionary operations for arbitrary designs inthe possible design space of data structures.

The middle part of Figure 2 depicts the components of the DataCalculator that make cost synthesis possible: 1) the library of dataaccess primitives, 2) the cost learning module which trains costmodels for each access primitive depending on hardware and dataproperties, and 3) the operation and cost synthesis module whichsynthesizes dictionary operations and their costs from the accessprimitives and the learned models. Next, we describe the processand components in detail.Cost Synthesis fromData Access Primitives. Each access prim-itive characterizes one aspect of how data is accessed. For example,a binary search, a scan, a random read, a sequential read, a randomwrite, are access primitives. The goal is that these primitives shouldbe fundamental enough so that we can use them to synthesize oper-ations over arbitrary designs as sequences of such primitives. Thereexist two levels of access primitives. Level 1 access primitives aremarked with white color in Figure 2 and Level 2 access primitivesare nested under Level 1 primitives and marked with gray color.

For example, a scan is a Level 1 access primitive used any time anoperation needs to search a block of data where there is no order.At the same time, a scan may be designed and implemented inmore than one way; this is exactly what Level 2 access primitivesrepresent. For example, a scan may use SIMD instructions for par-allelization if keys are nicely packed in vectors, and predicationto minimize branch mispredictions with certain selectivity ranges.In the same way, a sorted search may use interpolation searchif keys are arranged with uniform distribution. In this way, eachLevel 1 primitive is a conceptual access pattern, while each Level 2primitive is an actual implementation that signi"es a speci"c set ofdesign choices. Every Level 1 access primitive has at least one Level2 primitive and may be extended with any number of additionalones. The complete list of access primitives currently supported bythe Data Calculator is shown in Table 1 in appendix.Learned Cost Models. For every Level 2 primitive, the Data Cal-culator contains one or more models that describe its performance(latency) behavior. These are not static models; they are trainedand "tted for combinations of data and hardware pro"les as boththose factors drastically a!ect performance. To train a model, eachLevel 2 primitive includes a minimal implementation that capturesthe behavior of the primitive, i.e., it isolates the performance e!ectsof performing the speci"c action. For example, an implementationfor a scan primitive simply scans an array, while an implementa-tion for a random access primitive simply tries to access randomlocations in memory. These implementations are used to run asequence of benchmarks to collect data for learning a model forthe behavior of each primitive. Implementations should be in thetarget language/environment.

The models are simple parametric models; given the design deci-sion to keep primitives simple (so they can be easily reused), wehave domain expertise to expect how their performance behaviorwill look like. For example, for scans, we have a strong intuitionthey will be linear, for binary searches that they will be logarithmic,and that for random memory accesses that they will be smoothedout step functions (based on the probability of caching). These sim-ple models have many advantages: they are interpretable, they trainquickly, and they donÕt need a lot of data to converge. Through thetraining process, the Data Calculator learns coe$cients of thosemodels that capture hardware properties such as CPU and datamovement costs.

Hardware and data pro"les hold descriptive information aboutdata and hardware respectively (e.g., data distribution for data, andCPU, Bandwidth, etc. for hardware). When an access primitive istrained on a data pro"le, it runs on a sample of such data, andwhen it is trained for a hardware pro"le, it runs on this exacthardware. Afterward, though, design questions can get accuratecost estimations on arbitrary access method designs without goingover the data or having to have access to the speci"c machine.Overall, this is an o%ine process that is done once, and it can berepeated to include new hardware and data pro"les or to includenew access primitives.Example: Binary Search Model. To give more intuition abouthow models are constructed let us consider the case of a Level 2primitive of binary searching a sorted array as shown on the upperright part of Figure 4. The primitive contains a code snippet thatimplements the bare minimum behavior (Step 1 in Figure 4). We

Binary Search Primitive

1 11 17 37 51 66 80 94

if (data[middle] < search_val) { low = middle + 1; } else { high = middle;}middle = (low + high)/2;

f (x)!

i

ci1+ e! k i ( x ! x i )

Fitted Model

sum

of sig

moids

Benchmark Results

Tim

e (s

)

Region Size (KB)

L1 L2 MemoryL3

2.8e-8

0.8e-8

1.3e-7

8 32 1.6e4

Tim

e (s

)

Data Size (KB)

Benchmark Results

Log-Lin

ear Mo

del

Log-Lin

ear Mo

del

Fitted Model

f (x) = ax + blogx + c

User designs benchmark and chooses model

System runs benchmark and gathers data

Model trains and produces function for cost prediction

for (int i=0; i

I2 I3

L2 L3 L4É É

É

query target

h

containsdata

compressed

sorted

row-wisestorage

Serial Scan Keys

row-wisestorage

Sorted

Search

Keys

yes yes

no

no

yes

Decompress Data

Random Probe Value

no no

Serial Scan KV

pairs

yes

Sorted Search KV pairs

yes

blocks

no

partitioningFunction

no

yes

Random Probe

Partition

yes

zone maps

no

partitions

access

no

sorted

yes

Serial Scan Zone Maps

Sorted Search Zone

Maps

yesno

Bloom Filter Acces

s

inlined

Random Probe Block

Block

D1

Dn

É

If this element partitions data in blocks, go over each block

Try to prune some of the blocks using Þlters such us zone maps and bloom Þlters

Only continue for matching blocks

Materialize sub-blocks

Partitioning? - Log structured - Function?Capacity?Fanout?

bloom ßiters

Random

Probe Links

head/tail links

containsdata

compressed

sorted

row-wisestorage

Serial Scan Keys

row-wisestorage

Sorted

Search

Keys

yes yes

no

no

yes

Decompress Data

Random Probe Value

no no

Serial Scan KV

pairs

yes


yes

blocks

no

partitioningFunction

no

yes

Random Probe

Partition

yes

zone maps

no

partitions

access

no

sorted

yes


Sorted Search Zone

Maps

yesno

Bloom Filter Acces

s

inlined

Random Probe Block

Block

D1

Dn

É

If this element partitions data in blocks, go over each block

Try to prune some of the blocks using Þlters such us zone maps and bloom Þlters


Materialize sub-blocks

Partitioning? - Log structured - Function?Capacity?Fanout?

bloom ßiters

Random

Probe Links

head/tail links

I

L

h

Start contains

values and keys

sorted

row-wisestorage

Serial Scan Keys

row-wisestorage

Sorted Search Keys

true

false

true

Random Probe Value

false false

Serial Scan KV pairs

true


true

is elementterminal End

false

partitioningfunction

true

false

Random Probe Partition

true

zone mapsfalse

partitionsaccess

false

sorted

true


Sorted Search Zone Maps

truefalse

Bloom Filter Access

inlinedRandom Probe Block

truefalse

directaddressing

Block

D1

Df

É

other Þlter types


Sub-block data distribution1. Create blocks using: - Partitioning property - Capacity property - Fanout property2. Distribute data in blocks

bloom ßiters

Random Probe Links

head/tail links

falsetrue

Serial Scan

Sorted Searchdata access

primitive operationsynthesis

data layoutprimitive checks

Internal Node

1. fanout.type = FIXED;2. fanout.fixedVal = 20; 3. sorted = True; 4. zoneMaps.min = true;5. zoneMaps.max = false;6. retainsData = false;É46. capacity = BALANCED;

I

Leaf Node

1. fanout.type = FIXED;2. fanout.fixedVal = 64;3. sorted = True; 4. zoneMaps.min = false;5. zoneMaps.max = false;6. retainsData = true;É46. capacity = fixed;

L

I1

Data Access Operation Synthesis

INPUTPer Node Access Operation Synthesis

Rec

ursi

on

Recursion

LSMÉÉ

Þxed block size

variable block size

variable numberof blocks

Þxed number of blocks

Þxed number of blocks

L1

Hardware ProÞle

Data & Query Workload

Structure Layout

SpeciÞcations+ +

R I 2h

L1

Sorted search of zone-maps

Random probe to fetch leaf

S I 2Sorted search of zone-maps

Sorted search leaf data

Random probe to fetch node

I 1S

L1R

S

RP SIZEh

64, KV

BinarySearch

RandomProbe

BS 20, KBinarySearch

BinarySearch

RandomProbe

10, KBS

SIZERP

BSOpe

ratio

n S

ynth

esis

Out

put

Cos

t Syn

thes

is O

utpu

t

Materialize sub-block data

Forward gets to the correct sub-blocks

Try to Þlter sub-blocks

sub-block access cost(function call cost omitted)

sub-block access cost

data access cost

comments

Figure 5: Synthesizing the operation and cost for dictionary operation Get, given a data structure speci�cation.

Calculating Random Accesses and Caching E�ects. A cru-cial part in calculating the cost of most data structures is capturingrandom memory access costs, e.g., the cost of fetching nodes whiletraversing a tree, fetching nodes linked in a hash bucket, etc. If datais expected to be cold, then this is a rather straightforward case, i.e.,we may assign the maximum cost a random access is expected toincur on the target machine. If data may be hot, though, it is a moreinvolved scenario. For example, in a tree-like structure internalnodes higher in the tree are much more likely to be at higher levelsof the memory hierarchy during repeated requests. We calculatesuch costs using the random memory access primitive, as shown inthe lower right part of Figure 4. Its input is a Òregion sizeÓ, whichis best thought of as the amount of memory we are randomly ac-cessing into (i.e., we donÕt know where in this memory region ourpointer points to). The primitive is trained via benchmarking accessto an increasingly bigger contiguous array (Step 1 in Figure 4). Theresults (Step 2 in Figure 4) depict a minor jump from L1 to L2 (wecan see a small bump just after104 elements). The bump from L2to L3 is much more noticeable, with the cost of accessing memorygoing from0.1# 107 seconds to0.3# 107 seconds as the memorysize crosses the 128 KB boundary. Similarly, we see a bump from0.3# 107 seconds to1.3# 107 seconds when we go from L3 to mainmemory, at the L3 cache size of 16 MB3. We capture this behavioras a sum of sigmoid functions, which are essentially smoothed stepfunctions, using

c(x) =k"

i=1f (x) =

k"

i=1

ci1+ e$ki (logx$xi )

+ #0.

3These numbers are in line with IntelÕs Vtune.

This primitive is used for calculating random access to any physicalor logical region (e.g., a sequence of nodes that may be cachedtogether). For example, when traversing a tree, the cost synthesisoperation, costs random accesses with respect to the amount ofdata that may be cached up to this point. That is, for every nodewe need to access at Levelx of a tree, we account for a region sizethat includes all data in all levels of the tree up to Levelx. In thisway, accessing a node higher in the tree costs less than a node atlower levels. The same is true when accessing buckets of a hashtable. We give a detailed step by step example below.Example: Cache-aware Cost Synthesis. Assume a B-tree -likespeci"cation as follows: two node types, one for internal nodesand one for leaf nodes. Internal nodes contain fence pointers, aresorted, balanced, have a "xed fanout of 20 and do not contain anykeys or values. Leaf nodes instead are terminal; they contain bothkeys and values, are sorted, have a maximum page size of 250records, and follow a full columnar format, where keys and valuesare stored in independent arrays. The test dataset consists of105

records where keys and values are 8 bytes each. Overall, this meansthat we have 400 full data pages, and thus a tree of height 2. TheData Calculator needs two of its access primitives to calculate thecost of a Get operation. Every Get query will be routed throughtwo internal nodes and one leaf node: within each node, it needsto binary search (through fence pointers for internal nodes andthrough keys in leaf nodes) and thus it will make use of the SortedSearch access primitive. In addition, as a query traverses the tree itneeds to perform a random access for every hop.

Now, let us look in more detail how these two primitives areused given the exact speci"cation of this data structure. The SortedSearch primitive takes as input the size of the area over which we

will binary search and the number of keys. The Random Accessprimitive takes as input the size of the path so far which allows usto takes into account caching e!ects. Each query starts by visitingthe root node. The data calculator estimates the size of the pathso far to be 312 bytes. This is because the size of the path so faris in practice equal to the size of the root node which containing20 pointers (because the fanout is 20) and 19 values sums up atroot = internalnode= 20) 8 + 19) 8 = 312bytes. In this way,the Data Calculator logs a cost ofRandomAccess(312) to access theroot node. Then, it calculates the cost of binary search across 19fences, thus logging a cost ofSortedSearch(RowStore,19) 8). Ituses the ÒRowStoreÓ option as fences and pointers are stored aspairs within each internal node. Now, the access to the root nodeis fully accounted for, and the Data Calculator moves on to costthe access at the next tree level. Now the size of the path so faris given by accounting for the whole next level in addition to theroot node. This is in totalle! el2 = root+fanout) internalnode=312+ 20) 312= 6552bytes (due to fanout being 20 we accountfor 20 nodes at the next level). Thus to access the next node, theData Calculator logs a cost ofRandomAccess(6552) and again asearch cost ofSortedSearch(RowStore,19) 8) to search this node.The last step is to search the leaf level. Now the size of the pathso far is given by accounting for the whole size of the tree whichis le! el2 + 400) (250) 16) = 1606552bytes since we have 400pages at the next level (20x20) and each page has 250 records ofkey-value pairs (8 bytes each). In this way, the Data Calculator logsa cost ofRandomAccess(1606552) to access the leaf node, followedby a sorted search ofSortedSearch(ColumnStore, 250) 8) to searchthe keys. It uses the ÒColumnStoreÓ option as keys and values arestored separately in each leaf in di!erent arrays. Finally, a cost ofRandomAccess(2000) is incurred to access the target value in thevalues array (we have8 ) 250= 2000in each leaf).

Sets of Operations. The description above considers a singleoperation. The Data Calculator can also compute the latency for aset of operations concurrently in a single pass.This is e!ectively thesame process as shown in Figure 5 only that in every recursion wemay follow more than one path and in every step we are computingthe latency for all queries that would visit a given node.

Workload Skew and Caching E�ects. Another parameterthat can in#uence caching e!ects is workload skew. For exam-ple, repeatedly accessing the same path of a data structure resultsin all nodes in this path being cached with higher probability thanothers. The Data Calculator "rst generates counts of how manytimes every node is going to be accessed for a given workload. Us-ing these counts and the total number of nodes accessed we get afactorp = count/ total that denotes the popularity of a node. Thento calculate the random access cost to a node for an operationk, aweightw = 1/ (p ) sid) is used, wheresid is the sequence number ofthis operation in the workload (refreshed periodically). Frequentlyaccessed nodes see smaller access costs and vice versa.

Training Primitives. All access primitives are trained on warmcaches. This is because they are used to calculate the cost on a nodethat is already fetched. The only special case is the Random Accessprimitive which is used to calculate the cost of fetching a node.This is also trained on warm data, though, since the cost synthesisinfrastructure takes care at a higher level to pass the right regionsize as discussed; in the case this region is big, this can still result

in costing a page fault as large data will not "t in the cache whichis re#ected in the Random Access primitive model.

Limitations. For individual queries certain access primitivesare hard to estimate precisely without running the actual code onan exact data instance. For example, a scan for a point Get mayabort after checking just a few values, or it may need to go allthe way to the end of an array. In this way, while lower or upperperformance bounds can be computed with absolute con"dencefor both individual queries and sets of queries, actual performanceestimation works best for sets.More Operations. The cost of range queries, and bulk loading issynthesized as shown in Figure 10 in appendix.Extensibility and Cross-pollination. The rationale of havingtwo Levels of access primitives is threefold. First, it brings a level ofabstraction allowing higher level cost synthesis algorithms to oper-ate at Level 1 only. Second, it brings extensibility, i.e., we can addnew Level 2 primitives without a!ecting the overall architecture.Third, it enhances Òcross-pollinationÓ of design concepts capturedby Level 2 primitives across designs. Consider the following ex-ample. An engineer comes up with a new algorithm to performsearch over a sorted array, e.g., exploiting new hardware instruc-tions. To test if this can improve performance in her B-tree design,where she regularly searches over sorted arrays, she codes up abenchmark for a new sorted search Level 2 primitive and plugs itin the Calculator as shown in Figure 4. Then the original B-treedesign can be easily tested with and without the new sorted searchacross several workloads and hardware pro"les without having toundergo a lengthy implementation phase. At the same time, thenew primitive can now be considered by any data structure designthat contains a sorted array such as an LSM-tree with sorted runs, aHash-table with sorted buckets and so on. This allows easy transferof ideas and optimizations across designs, a process that usuallyrequires a full study for each optimization and target design.

4 WHAT-IF DESIGN ANDAUTO-COMPLETION

The ability to synthesize the performance cost of arbitrary designsallows for the development of algorithms that search the possibledesign space. We expect there will be numerous opportunities inthis space for techniques that can use this ability to: 1) improvethe productivity of engineers by quickly iterating over designs andscenarios before committing to an implementation (or hardware),2) accelerate research by allowing researchers to easily and quicklytest completely new ideas, 3) develop educational tools that allowfor rapid testing of concepts, and 4) develop algorithms for o%ineauto-tuning and online adaptive systems that transition betweendesigns. In this section, we provide two such opportunities forwhat-if design and auto-completion of partial designs.What-if Design. One can form design questions by varying anyone of the input parameters of the Data Calculator: 1) data structure(layout) speci"cation, 2) hardware pro"le, and 3) workload (dataand queries).For example, assume one already uses a B-tree-likedesign for a given workload and hardware scenario. The Data Cal-culator can answer design questions such as ÒWhat would be theperformance impact if I change my B-tree design by adding a bloom

!"#$%&’()$*+),CPU: 64x2.3GHzL3: 46MB RAM: 256GB

!"- $%&’()$*+),CPU: 4x2.3GHzL3: 46MB RAM: 16GB

!". $%&’()$*+),CPU: 64x2GHzL3: 16MB RAM: 1TB

!". $/%01)+,CPU: 64x2GHzL3: 16MB RAM: 1TB

!". $21(*+$*+),CPU: 64x2GHzL3: 16MB RAM: 1TB

! !!

!!

!!

!!!!!

! !!

!!

!!

!!!!!

! !!

!!

!!

!!!!!

! !!

!!

!!

!!!!!

! !!

!!

!!

!!!!!

! !!

!!

!!

!!!!!

! !!

!!

!!

!!!!!

! !!

!!

!!

!!!!!

105 105.5 106 106.5 107 105 105.5 106 106.5 107 105 105.5 106 106.5 107 105 105.5 106 106.5 107 105 105.5 106 106.5 107 105 105.5 106 106.5 107 105 105.5 106 106.5 107 105 105.5 106 106.5 1070.0e+00

5.0e! 02

1.0e! 01

1.5e! 01

2.0e! 01

0.0e+001.0e! 022.0e! 023.0e! 024.0e! 025.0e! 02

0.0e+00

2.0e! 02

4.0e! 02

6.0e! 02

0.0e+00

3.0e! 01

6.0e! 01

9.0e! 01

0.0e+00

1.0e! 02

2.0e! 02

3.0e! 02

4.0e! 02

0.0e+00

2.0e! 02

4.0e! 02

6.0e! 02

0.0e+00

5.0e! 02

1.0e! 01

1.5e! 01

0.0e+00

5.0e! 02

1.0e! 01

1.5e! 01

2.0e! 01

Number of entries (log scale)

Late

ncy

(sec

.)

!!

!

!!

!!

!

!

!

!

!

!

!!

!

!!

!

!

!

!

!

!

!

!

!!

!

!!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!!

!

!

!

!

!

!

!

! !

!

!

!

!

!

!!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

! !

!

!!

!!

!

!

!!

!!

! !

0.0e+00

2.5e! 07

5.0e! 07

7.5e! 07

1.0e! 06

0.0e+00

2.5e! 07

5.0e! 07

7.5e! 07

1.0e! 06

0.0e+00

5.0e! 07

1.0e! 06

1.5e! 06

0.0e+00

1.0e! 06

2.0e! 06

0.0e+00

2.0e! 06

4.0e! 06

0.0e+00

1.0e! 04

2.0e! 04

3.0e! 04

0.0e+00

5.0e! 03

1.0e! 02

1.5e! 02

2.0e! 02

0.0e+00

1.0e! 02

2.0e! 02

3.0e! 02

Late

ncy

(sec

.)

!!

!

!!

!!

!

!

!

!

!

!

!!

!

!!

!

!

!

!

!

!

!

!

!!

!

!!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!!

!

!

!

!

!

!!

! !

!

!

!

!!

!!

!

!

!

!

! !

!

!

!

!

!

!

!

!

!

!

!

! !

!

! !!

!

!!

!!

!!

!!

0.0e+00

2.5e−075.0e−077.5e−071.0e−06

0.0e+00

2.5e−075.0e−077.5e−071.0e−06

0.0e+00

5.0e−07

1.0e−06

1.5e−06

0.0e+00

1.0e−06

2.0e−06

0.0e+00

2.0e−06

4.0e−06

0.0e+00

1.0e−04

2.0e−04

3.0e−04

0.0e+00

5.0e−031.0e−021.5e−022.0e−02

0.0e+00

1.0e−02

2.0e−02

3.0e−02

Late

ncy

(sec

.)

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●● ●●

●

●

●●

● ●●● ●

0.0e+00

2.0e−07

4.0e−07

0.0e+00

2.0e−074.0e−076.0e−078.0e−07

0.0e+00

3.0e−07

6.0e−07

9.0e−07

0.0e+00

5.0e−07

1.0e−06

1.5e−06

0.0e+001.0e−062.0e−063.0e−064.0e−065.0e−06

0.0e+00

5.0e−05

1.0e−04

1.5e−04

0.0e+00

5.0e−03

1.0e−02

1.5e−02

0.0e+00

5.0e−03

1.0e−02

1.5e−02

Late

ncy

(sec

.)

!!

!

!!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

! !

!

!

!!

!!

!

!

!

!!

!!

!

!

!

!!

!!

!

!

!

!

!!

!

!

!

!

!

!

!

!!

!

! !!

! !!!!

!!! !

!! !

LINKEDLIST ARRAY RANGE! PARTIT.LINKED LIST SKIP! LIST TRIE B+TREE SORTED! ARRAY HASH! TABLE

0.0e+00

2.0e! 07

4.0e! 07

0.0e+00

2.0e! 07

4.0e! 07

6.0e! 07

8.0e! 07

0.0e+00

3.0e! 07

6.0e! 07

9.0e! 07

0.0e+00

5.0e! 07

1.0e! 06

1.5e! 06

0.0e+00

1.0e! 06

2.0e! 06

3.0e! 06

4.0e! 06

0.0e+00

5.0e! 05

1.0e! 04

1.5e! 04

0.0e+00

5.0e! 03

1.0e! 02

1.5e! 02

0.0e+00

5.0e! 03

1.0e! 02

1.5e! 02

Late

ncy

(sec

.)

!

Data Calculator

Implementation

Figure 6: The Data Calculator can accurately compute the latency of arbitrary data structure designs across a diverse set ofhardware and for diverse dictionary operations."lter in each leaf?Ó The user simply needs to give as input the high-level speci"cation of the existing design and cost it twice: oncewith the original design and once with the bloom "lter variation. Inboth cases, costing should be done with the original data, queries,and hardware pro"le so the results are comparable. In other words,users can quickly test variations of data structure designs simplyby altering a high level speci"cation, without having to implement,debug, and test a new design.Similarly, by altering the hardware orworkload inputs, a given speci"cation can be tested quickly on alter-native environments without having to actually deploy code to thisnew environment. For example, in order to test the impact of newhardware the Calculator only needs to train its Level 2 primitiveson this hardware, a process that takes a few minutes. Then, one cantest the impact this new hardware would have on arbitrary designsby running what-if questions without having implementations ofthose designs and without accessing the new hardware.Auto-completion. The Data Calculator can also complete partiallayout speci"cations given a workload, and a hardware pro"le. Theprocess is shown in Algorithm 1 in the appendix: The input is apartial layout speci"cation, data, queries, hardware, and the set ofthe design space that should be considered as part of the solution,i.e., a list of candidate elements. Starting from the last known pointof the partial speci"cation, the Data Calculator computes the rest ofthe missing subtree of the hierarchy of elements. At each step thealgorithm considers a new element as candidate for one of the nodesof the missing subtree and computes the cost for the di!erent kindsof dictionary operations present in the workload. This design is keptonly if it is better than all previous ones, otherwise it is droppedbefore the next iteration. The algorithm uses a cache to rememberspeci"cations and their costs to avoid recomputation. This processcan also be used to tell if an existing design can be improved bymarking a portion of its speci"cation as Òto be testedÓ. Solving thesearch problem completely is an open challenge as the design spacedrawn by the Calculator is massive. Here we show a "rst step whichallows search algorithms to select from a restricted set of elementswhich are also given as input as opposed to searching the wholeset of possible primitive combinations.

5 EXPERIMENTAL ANALYSISWe now demonstrate the ability of the Data Calculator to help withrich design questions by accurately synthesizing performance costs.

Implementation. The core implementation of the Data Calcu-lator is in C++. This includes the expert systems that handle layoutprimitives and cost synthesis. A separate module implemented inPython is responsible for analyzing benchmark results of Level 2access primitives and generating the learned models. The bench-marks of Level 2 access primitives are also implemented in C++such that the learned models can capture performance and hard-ware characteristics that would a!ect a full C++ implementationof a data structure. The learning process for each Level 2 accessprimitive is done each time we need to include a new hardwarepro"le; then, the learned coe$cients for each model are passed tothe C++ back-end to be used for cost synthesis during design ques-tions. For learning we use a standard loss function, i.e., least squareerrors, and the actual process is done via standard optimizationlibraries, e.g., SciPyÕs curve "t. For models which have non-convexloss functions such as thesum of sigmoidsmodel, we algorithmicallyset up good initial parameters.Accurate Cost Synthesis. In our "rst experiment we test the abil-ity to accurately cost arbitrary data structure speci"cations acrossdi!erent machines. To do this we compare the cost generated au-tomatically by the Data Calculator with the cost observed whentesting a full implementation of a data structure. We set-up theexperiment as follows. To test with the Data Calculator, we manu-ally wrote data structure speci"cations for eight well known accessmethods 1) Array, 2) Sorted Array, 3) Linked-list, 4) PartitionedLinked-list, 5) Skip-list, 6) Trie, 7) Hash-table, and 8) B+tree. TheData Calculator was then responsible for generating the designof operations for each data structure and computing their latencygiven a workload. To verify the results against an actual implemen-tation, we implemented all data structures above. We also imple-mented algorithms for each of their basic operations:Get, RangeGet, Bulk Load and Update. The "rst experiment then startswith a data workload of105 uniformly distributed integers and a

0

20

40

60

TRIE

B+TR

EE

SKIP

!LIS

T

SORT

ED!A

RRAY

RANG

E PA

RT.

LINK

ED!L

IST

HASH

!TABL

E

LINK

ED!L

IST

ARRA

Y

(a) Data Structure

Tota

l Lat

ency

(se

c.) Data Calculator

Implementation

0

25

50

75

100

HW1

HW2

HW3

(b) Hardware

Trai

ning

Tim

e (s

ec.)

Figure 7: Computing Bulk-loading cost (left) and Trainingcost across diverse hardware (right).

!

!

!

!

!

!!

!

!

!

!

!!

!!

!

!

!

!

!

!

!

!

!

! !

!

!

!

!!

!

!

!

!

!

!

! !

HW1 HW2 HW3

105 105.5 106 106.5 107105 105.5 106 106.5 107105 105.5 106 106.5 1070.0

0.2

0.4

0.6

0.8

Number of entries (log scale)

Late

ncy

(mic

rose

c.)

!Data Calculator Implementation

CSB+Tree B+TREE CSB+TREE

0.5 1.0 1.5 2.0 0.5 1.0 1.5 2.0

Zipf alpha parameter

Data Calculator

Implementation

HW1

Figure 8: Accurately computing the latency of cache con-scious designs in diverse hardware and workloads.sequence of102 Get requests, also uniformly distributed. We incre-mentally insert more data up to a total of107 entries and at eachstep we repeat the query workload.

The top row of Figure 6 depicts results using a machine with 64cores and 264 GB of RAM. It shows the average latency per queryas data grows as computed by the Data Calculator and as observedwhen running the actual implementation on this machine. For easeof presentation results are ranked horizontally from slower to faster(left to right). The Data Calculator gives an accurate estimation ofthe cost across the whole range of data sizes and regardless of thecomplexity of the designs both in terms of the data structure. Itcan accurately compute the latency of both simple traversals in aplain array and the latency of more complex access patterns suchas descending a tree and performing random hops in memory.Diverse Machines and Operations. The rest of the rows in Fig-ure 6 repeat the same experiment as above using di!erent hardwarein terms of both CPU and memory properties (Rows 2 and 3) anddi!erent operations (Rows 4 and 5). The details of the hardware areshown on the right side of each row in Figure 6. Regardless of themachine or operation, the Data Calculator can accurately cost anydesign. By training its Level 2 primitives on individual machinesand maintaining a pro"le for each one of them, it can quickly testarbitrary designs over arbitrary hardware and operations. Updateshere are simple updates that change the value of a key-value pairand so they are e!ectively the same as a point query with an addi-tional write access. More complex updates that involve restructuresare left for future work both in terms of the design space and costsynthesis. Finally, Figure 7a) depicts that the Data Calculator canaccurately synthesize the bulk loading costs for all data structures.Training Access Primitives. Figure 7b) depicts the time neededto train all Level 2 primitives on a diverse set of machines. Overall,this is an inexpensive process. It takes merely a few minutes totrain multiple di!erent combinations of data and hardware pro"les.

0.0

2.5

5.0

7.5

105 106 107

# of inserts

Syn

thes

is c

ost (

min

.)

0

10

20

30

105 106 107

# of inserts

Syn

thes

is c

ost (

min

.)

Hyb

rid B

+Tre

e / H

ash

Tabl

e / A

rray

DATAPAGE

HASH PARTITIONING

ELEMENT

B+TREEELEMENT

Hyb

rid H

ash

Tabl

e/B

+Tre

e

(a) Scenario 1

B+TREEELEMENT

DATAPAGE(system page size)

HASH PARTITIONING

Point get intensive

Only writes

DATAPAGE (large chunks)

Range intensive

DATAPAGE(system page size)

B+TREEELEMENT

DATAPAGE

(b) Scenario 2

Point get intensive

Only writes

Figure 9: TheData Calculator designs newhybrids of knowndata structures to match a given workload.

Cache Conscious Designs and Skew. In addition, Figure 8repeats our base "tting experiment using a cache-conscious design,Cache Conscious B+tree (CSB). Figure 8a) depicts that the DataCalculator accurately predicts the performance behavior acrossa diverse set of machines, capturing caching e!ects of growingdata sizes and design patterns where the relative position of nodesa!ects tree traversal costs. We use the ÒFullÓ design from CacheConscious B+tree [67]. Furthermore, Figure 8b) tests the "ttingwhen the workload exhibits skew. For this experiment Get querieswere generated with a Zip"an distribution:$ = {0.5,1.0,1.5,2.0}.Figure 8b) shows that for the implementation results, workloadskew improves performance and in fact it improves more for thestandard B+tree. This is because the same paths are more likely to betaken by queries resulting in these nodes being cached more often.Cache Conscious B+tree improves but at a much slower rate as itis already optimized for the cache hierarchy. The Data Calculatoris able to synthesize these costs accurately, capturing skew and therelated caching e!ects.Rich Design Questions. In our next, experiment we provide in-sights about the kinds of design questions possible and how longthey may take, working over a B-tree design and a workload ofuniform data and queries: 1 million inserts and 100 point Gets. Thehardware pro"le used is HW1 (de"ned in Figure 6). The user asks"What if we change our hardware to HW3?". It takes the Data Cal-culator only 20 seconds (all runs are done on HW3) to computethat the performance would drop. The user then asks: "Is there abetter design for this new hardware and workload if we restrictsearch on a speci"c set of "ve possible elements?" (from the poolof element on right side of Figure 3). It takes only 47 seconds forthe Data Calculator to compute the best choice. The user then asksÒWould it be bene"cial to add a bloom "lter in all B-tree leaves?ÓThe Data Calculator computes in merely 20 seconds that such adesign change would be bene"cial for the current workload andhardware. The next design question is: "What if the query workloadchanges to have skew targeting just 0.01% of the key space?" TheData Calculator computes in 24 seconds that this new workloadwould hurt the original design and it computes a better design inanother 47 seconds.

In two of the design phases the user asked Ògive me a betterdesign if possibleÓ. We now provide more intuition for this kind ofdesign questions regarding the cost and scalability of computingsuch designs as well as the kinds of designs the Data Calculator mayproduce to "t a workload. We test two scenarios for a workload

of mixed reads and writes (uniformly distributed inserts and pointreads) and hardware pro"le HW3. In the "rst scenario, all readsare point queries in 20% of the domain. In the second scenario,50% of the reads are point reads and touch 10% of the domain,while the other half are range queries and touch a di!erent (nonintersecting with the point reads) 10% of the domain. We do notprovide the Data Calculator with an initial speci"cation. Given thecomposition of the workload our intuition is that a mix of hashing,B-tree like indexing (e.g., with quantile nodes and sorted pages), anda simple log (unsorted pages) might lead to a good design, and so weinstruct the Data Calculator to use those four elements to constructa design (this is done using Algorithm 1 but starting with an emptyspeci"cation. Figure 9 depicts the speci"cations of the resultingdata structures For the "rst scenario (left side of Figure 9) the DataCalculator computed a design where a hashing element at the upperlevels of the hierarchy allows to quickly access data but then datais split between the write and read intensive parts of the domain tosimple unsorted pages (like a log) and B+tree -style indexing forthe read intensive part. For the second scenario (right side of Figure9), the Data Calculator produces a design which similarly to theprevious one takes care of read and writes separately but this timealso distinguishes between range and point gets by allowing thepart of the domain that receives point queries to be accessed withhashing and the rest via B+tree style indexing. The time neededfor each design question was in the order of a few seconds upto 30 minutes depending on the size of the sample workload (thesynthesis costs are embedded in Figure 9 for both scenarios). Thus,the Data Calculator quickly answers complex questions that wouldnormally take humans days or even weeks to test fully.

6 RELATEDWORKTo the best of our knowledge this is the "rst work to discuss theproblem of interactive data structure design and to compute theimpact on performance. However, there are numerous areas fromwhere we draw inspiration and with which we share concepts.Interactive Design. Conceptually, the work on Magic for layouton integrated circuits [62] comes closest to our work. Magic usesa set of design rules to quickly verify transistor designs so theycan be simulated by designers. In other words, a designer maypropose a transistor design and Magic will determine if this iscorrect or not. Naturally, this is a huge step especially for hardwaredesign where actual implementation is extremely costly. The DataCalculator pushes interactive design one step further to incorporatecost estimation as part of the design phase by being able to estimatethe cost of adding or removing individual design options whichin turn also allows us to build design algorithms for automaticdiscovery of good and bad designs instead of having to build andtest the complete design manually.Generalized Indexes.One of the stronger connections is the workon Generalized Search Tree Indexes (GiST) [6, 7, 38, 47—50]. GiSTaims to make it easy to extend data structures and tailor them tospeci"c problems and data with minimal e!ort. It is a template, anabstract index de"nition that allows designers and developers toimplement a large class of indexes. The original proposal focused onrecord retrieval only but later work added support for concurrency[48], a more general API [6], improved performance [47], selectivity

estimation on generated indexes [7] and even visual tools that helpwith debugging [49, 50]. While the Data Calculator and GiST sharemotivation, they are fundamentally di!erent: GiST is a template toimplement tailored indexes while the Data Calculator is an enginethat computes the performance of a design enabling rich designquestions that compute the impact of design choices before we startcoding, making these two lines of work complementary.Modular/Extensible Systems and System Synthesizers. A keypart of the Data Calculator is its design library, breaking down adesign space in components and then being able to use any set ofthose components as a solution. As such the Data Calculator sharesconcepts with the stream of work on modular systems, an idea thathas been explored in many areas of computer science: in databasesfor easily adding data types [31, 32, 60, 61, 70] with minimal im-plementation e!ort, or plug and play features and whole systemcomponents with clean interfaces [11, 14, 17, 45, 53, 54], as well asin software engineering [63], computer architecture [62], and net-works [46]. Since for every area the output and the components aredi!erent, there are particular challenges that have to do with de"n-ing the proper components, interfaces and algorithms. The conceptof modularity is similar in the context of the Data Calculator. Thegoal and application of the concept is di!erent though.Additional Topics. Appendix B discusses additional related topicssuch as auto-tuning systems and data representation synthesis inprogramming languages.

7 SUMMARY AND NEXT STEPSThrough a new paradigm of "rst principles of data layouts andlearned cost models, the Data Calculator allows researchers andengineers to interactively and semi-automatically navigate complexdesign decisions when designing or re-designing data structures,considering new workloads, and hardware. The design space wepresented here includes basic layout primitives and primitives thatenable cache conscious designs by dictating the relative positioningof nodes, focusing on read only queries. The quest for the "rstprinciples of data structures needs to continue to "nd the primi-tives for additional signi"cant classes of designs including updates,compression, concurrency, adaptivity, graphs, spatial data, versioncontrol management, and replication. Such steps will also requireinnovations for cost synthesis. For every design class added (oreven for every single primitive added), the knowledge gained interms of the possible data structures designs grows exponentially.Additional opportunities include full DSLs for data structures, com-pilers for code generation and eventually certi"ed code [66, 73],new classes of adaptive systems that can change their core designon-the-#y, and machine learning algorithms that can search thewhole design space.

8 ACKNOWLEDGMENTSWe thank the reviewers for valuable feedback and direction. MarkCallaghan provided the quotes on the importance of data structuredesign. Harvard DASlab members Yiyou Sun, Mali Akmanalp andMo Sun helped with parts of the implementation and the graph-ics. This work is partially funded by the USA National ScienceFoundation project IIS-1452595.

REFERENCES[1] Daniel J. Abadi, Peter Boncz, Stavros Harizopoulos, Stratos Idreos, and Samuel Madden. 2013.

The Design and Implementation of Modern Column-Oriented Database Systems.Foundationsand Trends in Databases5, 3 (2013), 197—280.

[2] Dana Van Aken, Andrew Pavlo, Geo!rey J Gordon, and Bohan Zhang. 2017. Automatic Data-base Management System Tuning Through Large-scale Machine Learning. InProceedings ofthe ACM SIGMOD International Conference on Management of Data. 1009—1024.

[3] Ioannis Alagiannis, Stratos Idreos, and Anastasia Ailamaki. 2014. H2O: A Hands-free Adap-tive Store. InProceedings of the ACM SIGMOD International Conference on Management ofData. 1103—1114.

[4] Victor Alvarez, Felix Martin Schuhknecht, Jens Dittrich, and Stefan Richter. 2014. Main Mem-ory Adaptive Indexing for Multi-Core Systems. InProceedings of the International Workshopon Data Management on New Hardware (DAMON). 3:1Ñ-3:10.

[5] Michael R. Anderson, Dolan Antenucci, Victor Bittorf, Matthew Burgess, Michael J. Cafarella,Arun Kumar, Feng Niu, Yongjoo Park, Christopher R, and Ce Zhang. 2013. Brainwash: AData System for Feature Engineering. InProceedings of the Biennial Conference on InnovativeData Systems Research (CIDR).

[6] Paul M Aoki. 1998. Generalizing "Search" in Generalized Search Trees (Extended Abstract).In Proceedings of the IEEE International Conference on Data Engineering (ICDE). 380—389.

[7] Paul M Aoki. 1999. How to Avoid Building DataBlades That Know the Value of Everything andthe Cost of Nothing. InProceedings of the International Conference on Scienti!c and StatisticalDatabase Management (SSDBM). 122—133.

[8] Joy Arulraj, Andrew Pavlo, and Prashanth Menon. 2016. Bridging the Archipelago betweenRow-Stores and Column-Stores for Hybrid Workloads. InProceedings of the ACM SIGMODInternational Conference on Management of Data.

[9] Manos Athanassoulis, Michael S. Kester, Lukas M. Maas, Radu Stoica, Stratos Idreos, AnastasiaAilamaki, and Mark Callaghan. 2016. Designing Access Methods: The RUM Conjecture. InProceedings of the International Conference on Extending Database Technology (EDBT). 461—466.

[10] Shivnath Babu, Nedyalko Borisov, Songyun Duan, Herodotos Herodotou, and VamsidharThummala. 2009. Automated Experiment-Driven Management of (Database) Systems. InPro-ceedings of the Workshop on Hot Topics in Operating Systems (HotOS).

[11] Don S Batory, J R Barnett, J F Garza, K P Smith, K Tsukuda, B C Twichell, and T E Wise.1988. GENESIS: An Extensible Database Management System.IEEE Transactions on SoftwareEngineering (TSE)14, 11 (1988), 1711—1730.

[12] Philip A. Bernstein and David B. Lomet. 1987. CASE Requirements for Extensible DatabaseSystems.IEEE Data Engineering Bulletin10, 2 (1987), 2—9.

[13] Alfonso F. Cardenas. 1973. Evaluation and Selection of File Organization - A Model andSystem.Commun. ACM16, 9 (1973), 540—548.

[14] Michael J Carey and David J DeWitt. 1987. An Overview of the EXODUS Project.IEEE DataEngineering Bulletin10, 2 (1987), 47—54.

[15] M.J. Steindorfer, and J.J. Vinju. 2016. Towards a software product line of trie-based collections.In Proceedings of the ACM SIGPLAN International Conference on Generative Programming: Con-cepts and Experiences.

[16] Surajit Chaudhuri and Vivek R. Narasayya. 1997. An E$cient Cost-Driven Index SelectionTool for Microsoft SQL Server. InProceedings of the International Conference on Very LargeData Bases (VLDB). 146—155.

[17] Surajit Chaudhuri and Gerhard Weikum. 2000. Rethinking Database System Architecture:Towards a Self-Tuning RISC-Style Database System. InProceedings of the International Con-ference on Very Large Data Bases (VLDB). 1—10.

[18] C. Loncaric, E.Torlak, and M.D Ernst. 2016. Fast synthesis of fast collections. In Proceedingsof the ACM SIGPLAN Conference on Programming Language Design and Implementation.

[19] D. Cohen and N. Campbell. 1993. Automating Relational Operations on Data Structures. IEEESoftware10, 3 (1993), 53—60.

[20] E. Schonberg, J.T. Schwartz and Sharir. 1979. Automatic Data Structure Selection in SETL. InProceedings of the ACM Symposium on Principles of Programming Languages.

[21] E. Schonberg, J.T. Schwartz and Sharir. 1981. An Automatic Technique for Selection of DataRepresentations in SETL Programs. ACM Transactions on Programming Languages and Sys-tems3, 2 (1981), 126—143.

[22] Alvin Cheung. 2015. Towards Generating Application-Speci"c Data Management Systems.In Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR).

[23] Stratos Idreos, Kostas Zoumpatianos, Brian Hentschel, Mike Kester, Demi Guo. 2018. TheInternals of the Data Calculator. Harvard Data Systems Laboratory, Technical Report(2018).

[24] O. Shacham, M. Vechev, and E. Yahav. 2009. Chameleon: Adaptive Selection of Collections.In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Imple-mentation.

[25] P. Hawkins, A. Aiken, K. Fisher, M.C. Rinard and M. Sagiv. 2011. Data representation syn-thesis. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design andImplementation.

[26] P. Hawkins, A. Aiken, K. Fisher, M.C. Rinard, and M. Sagiv. 2012. Concurrent data represen-tation synthesis. In Proceedings of the ACM SIGPLAN Conference on Programming LanguageDesign and Implementation.

[27] Y. Smaragdakis, and D. Batory.1997. DiSTiL: A Transformation Library for Data Structures. InProceedings of the Conference on Domain-Speci!c Languages on Conference on Domain-Speci!cLanguages (DSL).

[28] Niv Dayan, Manos Athanassoulis, and Stratos Idreos. 2017. Monkey: Optimal Navigable Key-Value Store. InProceedings of the ACM SIGMOD International Conference on Management ofData. 79—94.

[29] Martin Dietzfelbinger, Torben Hagerup, Jyrki Katajainen, and Martti Penttonen. 1997. A Re-liable Randomized Algorithm for the Closest-Pair Problem.J. Algorithms(1997).

[30] Jens Dittrich and Alekh Jindal. 2011. Towards a One Size Fits All Database Architecture. InProceedings of the Biennial Conference on Innovative Data Systems Research (CIDR). 195—198.

[31] David Goldhirsch and Jack A Orenstein. 1987. Extensibility in the PROBE Database System.IEEE Data Engineering Bulletin10, 2 (1987), 24—31.

[32] Goetz Graefe. 1994. Volcano - An Extensible and Parallel Query Evaluation System.IEEETransactions on Knowledge and Data Engineering (TKDE)6, 1 (feb 1994), 120—135.

[33] Goetz Graefe. 2011. Modern B-Tree Techniques.Foundations and Trends in Databases3, 4(2011), 203—402.

[34] Goetz Graefe, Felix Halim, Stratos Idreos, Harumi Kuno, and Stefan Manegold. 2012. Concur-rency control for adaptive indexing.Proceedings of the VLDB Endowment5, 7 (2012), 656—667.

[35] Jim Gray. 2000. What Next? A Few Remaining Problems in Information Technlogy.ACMSIGMOD Digital Symposium Collection2, 2 (2000).

[36] Richard A

the data calculator : data structure design and cost ... · the data calculator∗: data structure...

Documents