functionally encrypted datastoresshwetaag/papers/fed.pdfconference’17, july 2017, washington, dc,...

Functionally Encrypted DatastoresRachit GargIIT Madras

[email protected]

Nishant KumarMicrosoft Research

[email protected]

Shweta AgrawalIIT Madras

[email protected]

Manoj PrabhakaranIIT Bombay

[email protected]

ABSTRACTWe introduce the notion of a Functionally Encrypted Datastorewhichcollects data from multiple data-owners, stores them encrypted onan untrusted server, and allows untrusted clients to make select-and-compute queries on the collected data. Little coordination andno communication is required among the data-owners or the clients.Our security and performance profile is similar to that of conven-tional searchable encryption systems, while the functionality weoffer is significantly richer. The client specifies a query as a pair(Q, f ) where Q is a filtering predicate which selects some sub-set of the dataset and f is a function on some computable valuesassociated with the selected data. We provide efficient protocolsfor various functionalities of practical relevance. We demonstratethe utility, efficiency and scalability of our protocols via extensiveexperimentation. In particular, we use our protocols to model com-putations relevant to the Genome Wide Association Studies such asMinor Allele Frequency (MAF), Chi-square analysis and HammingDistance.

KEYWORDSSearchable encryption, Computing on encrypted cloud storage,Genome wide association studies

ACM Reference Format:, Rachit Garg, Nishant Kumar, Shweta Agrawal, and Manoj Prabhakaran.2019. Functionally Encrypted Datastores. In Proceedings of ACM Conference(Conference’17). ACM, New York, NY, USA, 20 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn

1 INTRODUCTIONGiven the importance of cloud computing today, enabling controlledcomputation on large encrypted cloud storage is of much practicalvalue in various privacy sensitive situations. Over the last severalyears, several tools have emerged that offer a variety of approachestowards this problem, offering different trade-offs among security,efficiency and generality. While theoretical schemes based on mod-ern cryptographic tools like secure multi-party computation (MPC)

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credit is permitted. To copy otherwise, or republish,to post on servers or to redistribute to lists, requires prior specific permission and/or afee. Request permissions from [email protected]’17, July 2017, Washington, DC, USA© 2019 Association for Computing Machinery.ACM ISBN 978-x-xxxx-xxxx-x/YY/MM. . . $15.00https://doi.org/10.1145/nnnnnnn.nnnnnnn

[24, 60], fully homomorphic encryption (FHE) [23] or functionalencryption (FE) [8, 53] can provide strong security guarantees, theircomputational and communication requirements are incompatiblewith most of the realistic applications today. At the other end areefficient tools like CryptDB [51], Monomi [56], Seabed [46] and Arx[49], which add a lightweight encryption layer under the hood ofconventional database queries, but, as we discuss later, offer limitedsecurity guarantees and do not support collaborative databases.While there also exist tools which seek to strike a different balanceby trading off some efficiency for more robust security guaranteesand better support for collaboration – like Searchable Encryption(starting with [19]), and Controlled Functional Encryption [42] –they offer limited functionality. In short, the existing solutions forsecure databases lack several key features (discussed in more detaillater).

In this work, we propose a new solution – called FunctionallyEncrypted Datastore (FED) – that can be used to implement a securecloud-based data store that collects data from multiple data-owners,stores it encrypted on an untrusted server, and allows untrustedclients to make select-and-compute queries on the collected data.Little coordination and no communication is required among thedata-owners or the clients. Our security and performance profile issimilar to that of conventional searchable encryption systems, whilethe functionality we offer is significantly richer.

Some Motivating Scenarios. There are many scenarios todaywhen data is collected from many users in a centralized databaseand made available to other users for querying. In such situations,the individuals whose data has been collected should be consid-ered the actual data-owners, who are providing their data withan expectation that it will be used only for certain well-specifiedpurposes. One such example is that of census data: governmentsuse census data for administrative purposes, and also provide re-stricted access to select researchers for select purposes [27, 62].Another scenario involves private corporations who can collectlarge amounts of information about their customers, which couldbe legitimately useful in improving their customer service. A thirdexample, and the one that we use in our experimental analysis, is of-fered by genomic studies. Genome Wide Association Studies (GWAS)look into entire genomes across different individuals to discoverassociations between genetic variants and particular diseases ortraits [10, 39, 40].

However, in all such scenarios, individuals’ privacy is vulnerableand the absence of fine grained control can lead to significant prob-lems. This was illustrated in an incident involving Ipsos Mori, a

https://doi.org/10.1145/nnnnnnn.nnnnnnn



Conference’17, July 2017, Washington, DC, USA Rachit Garg, Nishant Kumar, Shweta Agrawal, and Manoj Prabhakaran

British marketing research firm, who offered to share data pertain-ing to 27 million mobile phone users with the police [33]. Whilethey defended themselves by insisting that they will only releasesafe aggregated data which their users had acquiesced to [44], thedeal was shelved under public pressure. Similarly, GWAS studiesinvolve sharing highly sensitive information about the individualswith several researchers in different parts of the world, and theindividuals may have little control over how their data will be usedin future. Indeed, as evidenced by the case of the Havasupai tribeagainst the Arizona State University [1, 2], the researchers fromthe university collected genetic data for studying links to diabetesand later used it for other sensitive subjects like schizophrenia,migration and inbreeding, without the consent of the individualsor the community who contributed the data.

These are a few of the many scenarios that motivate the need toimplement fine-grained control over computation on data collectedfrom multiple data-owners. While legislation can help in this re-gard [4], technological solutions can go a long way in enforcingsuch legislation and also in enabling more legitimate uses of datawhile preventing privacy violations.

Addressing the Challenge. Guided by the above scenarios, weset our goals. We require a datastore that provides strong securityguarantees while offering flexible functionality — multiple dataowners, and support for expressive computation queries — andpractical efficiency. We briefly discuss what this entails.

Firstly, accommodating multiple data owners in a flexible andscalable manner calls for several features:

– No coordination among data-owners. The data-owners should notneed to trust, or even be aware of each other.

– Untrusted clients. The clients should not need to hold any state orkeymaterial (except as may be needed to authenticate themselvesto the servers, in a higher-level application). This allows openingup the datastore to a large set of clients without requiring totrust them.

– Data-owners oblivious of the clients. The data-owners should notneed to be aware of the clients who will query the datastore inthe future, and in particular, cannot provide them with any keys.Further, the data-owners need not be online when the queriesarrive from the clients.

This set of requirements creates a tension with the basic premiseof untrusted servers in a secure datastore: If the data-owners areoblivious of the clients and are not online when clients access thedatastore, then the datastore must be fully operable by the servers;but this allows the servers to freely access the collected data (e.g., byemulating the role of clients to make as many accesses as they wish).Resolving this tension necessitates a trust assumption: We require thatthere are multiple non-colluding servers. This is often a reasonableassumption in practice, given the growing availability of cloudcomputing services from competing service providers. We seek asolution that uses multiple servers minimally – there should be onlyone server with large storage (who stores all the encrypted data),and a single auxiliary server (who may store some key material).Either server by itself can be corrupt, but they will be assumed notto collude with each other.

To support expressive queries, we seek efficient implementationof typical (relational) database operations on a single table. To-wards this we seek to support a select-and-compute functionality.We consider queries to be specified as a pair (Q, f ) where Q is afiltering predicate which selects some subset of the dataset and fis a function on the selected data. A key feature we seek is that thecomputation overheads for a select-and-compute query should notscale with the entire database size, but only with the number ofselected records. This would offer substantial efficiency gains inmost practical applications: for instance, in a census-like databasefor a country, a selection query for a zip-code would result in asmall fraction of all the records to be chosen.

Finally, we seek a level of cryptographic security and efficiencythat is typical of existing searchable encryption works (e.g., [36, 43]).Here, the security guarantees are formulated in the form of an idealfunctionality (following the Universally Composable security frame-work) and well-defined leakage functions specified as part of thisfunctionality. These solutions are highly scalable and capable of han-dling large databases [12, 19, 32, 43], with the query-time complex-ity scaling with the number of selected rows (as discussed above).However, we note that these works are restricted to a single dataowner and only search queries (rather than search-and-compute).For FED, we shall set a similar security-efficiency combination asthe goal, while requiring a much richer functionality (i.e., multipledata owners and computation on selected data).

1.1 Our ResultsAs discussed above, we introduce the notion of a Functionally En-crypted Datastore (FED), which permits a data-owner to securelyoutsource its data to a storage server, such that, with the helpof an auxiliary server, clients can efficiently carry out select-and-compute queries. Apart from introducing the notion of FED ourcontributions include:• A general framework for instantiating FED schemes. The frame-work is modular, and consists of two components, which may beof independent interest (when only the corresponding function-ality is desired) – namely, Searchably Encrypted Datastore (SED)and Computably Encrypted Datastore (CED).• We provide instantiations of this framework for evaluating arbi-trary functions, as well as important special classes of functions(the latter more efficiently than the former). As part of this, we in-stantiate a multiple data-owner version of searchable encryption,which is of independent interest.• We demonstrate the utility and practicality of our protocols basedon extensive experimentation, involving realistic statistical anal-ysis tasks for genomic studies.Along the way we obtain several other results that are of inde-

pendent interest.– As a starting point for our constructions, we give construc-tions for “single data-owner” versions of FED, SED and CED(denoted by sFED, sSED and sCED, respectively), which are sim-pler and more efficient, albeit offering no support for multipledata-owners.

– One of our methods for upgrading sSED to SED is in the randomoracle model. As part of this solution, we augment the securityof an Oblivious Pseudorandom Function (OPRF) protocol in the

Functionally Encrypted Datastores Conference’17, July 2017, Washington, DC, USA

literature to be UC-secure in our corruption model, which allowsactive corruption of the clients, and passive corruption of theservers.

– We introduce a versatile collection of techniques under the um-brella of “onion secret-sharing” by leveraging “onion encryption”which was originally used in anonymous routing and mixednets [17, 52]. In our setting, we use these to distribute secretshares of multiple messages to multiple receivers, maintainingthe link across the shares of the same message while delinkingthe messages from their origin.

1.2 Overview of ConstructionsWe present several modular constructions of FED (and the single-dataowner version sFED), which can be instantiated in multipleways, by plugging in different implementations of its components.As discussed above, we identify two simpler primitives, SearchablyEncrypted Datastore (SED) and Computably Encrypted Datastore(CED), and show how they can be securely dovetailed into a con-struction of FED. Following is a roadmap to the constructions inthis work.

SEDCED

map &merge-map

‣ OPRF-based‣ SFE-based

sFEDSection 3.1

(see Figure 2)

FED

Section 4 (see Figure 3)

Section 4.4 Section 4.2(see Figure 4)

sCED

‣ Value Retrieval‣ Summation‣ Summation (alt)‣ General

Section 3.2 sSED

Section 3.3

Section 4.3

OnionSecret-Sharing

Section 4.1

sFEDD C

A

S

Z Q, f

f(Q[Z])

sCED

D CSA

X

Q

f(δT[X])

sSED

W

f

Q[W ]

T

Z Q, f

f(Q[Z])

CED

CSA

Xm

Q

f(δT[X])

SED

Wm

f

Q[W ]

T

Q, f

f(Q[Z])FED C

A

S

Z1

Q, f

f(Q[Z])

Dm

D1

Zm

..

. … DmD1ZmZ1

X1

W1

A S

QW1 Wm

…

Q[W ]

C

W Q

K"

Q[W ]map

SED

DmD1

K#

merge-map

The starting point of our constructions are single data-ownerversions sSED and sCED, shown at the bottom of the above map.We show that these components can be implemented by leveragingconstructions from the literature, namely, the Multi-Client Sym-metric Searchable Encryption (MC-SSE) scheme due to Jarecki etal. [31] and Controlled Functional Encryption (CFE) due to Naveedet al. [42]. The search query family supported by our sSED con-structions are the same as in [31]. For sCED, we support a fewspecialized functions, as well as a general function family. Theprimitives sSED and sCED are of independent interest, and alsothey can be combined to yield a single data-owner version of FED(called sFED).

To upgrade sSED and sCED constructions into full-fledged (multidata-owner) SED and CED schemes, we require several new ideas.One challenge in this setting is to be able to hide the association ofthe (encrypted) data items with their data-owners. Our approach isto first securely merge the data from the different data-owners in

a way that removes this association, and then use the single data-owner constructions on the merged data set. For this, both SEDand CED constructions rely on onion secret-sharing techniques(see Section 4.1 for an overview). In the case of CED, mergingessentially consists of a multi-set union. But in the case of sSED,merging entails merging “search indices” on individual data setsinto a search index for the combined data set. Since the data-ownersdo not trust (or are even aware of) each other, this operation mustbe implemented via the servers in an “oblivious” fashion, and onionsecret-sharing techniques alone are not adequate. We propose twoapproaches to merge the indices – one in the random oracle modelusing an Oblivious Pseudorandom Function (OPRF) protocol, andanother one with comparable efficiency in the standard model,relying on 2-party Secure Function Evaluation (SFE).

1.3 Related WorkBelow, we contrast our notion of Functionally Encrypted Datastoreswith prior work in terms of various features. A summary appearsin Table 1 in Appendix B.• Multiple data owners: We allow encrypted data to be col-

lected from multiple data-owners. The only prior works that allowthis are multiparty computation [24, 60], multi-input functionalencryption [26], multi-key fully homomorphic encryption [38] andcontrolled functional encryption (CFE) [42], all of which suffer costsproportional to the entire dataset.• Rich functionalitywith strong security: Symmetric search-

able encryption (SSE) [9, 12, 13, 19, 21, 43, 47]) supports keywordsearches and has performance sublinear in the size of the dataset,but does not allow computation on the search results. In contrast,tools such as CryptDB [51], Monomi [56], Seabed [46] and Arx [49]do allow full-fledged search-and-compute (searches are attributebased rather than keyword queries). But they incur higher leakageto the server and argue security in the “snapshot attacker model”which has been criticized for being unrealistically weak [28]. More-over, their security analysis assumes fully trusted clients who do notcollude with the server(s). In contrast, we offer a stronger securitymodel where clients may be malicious, either server can colludewith a subset of data owners and/or clients and the attacker ispersistent, i.e. has access to the view of the corrupt parties through-out the lifetime of the system. Leakage to servers is limited to theleakage typical in SSE, and there is no leakage to data owners orclients.• Computationally light clients: Our clients are very effi-

cient and only performwork proportional to the size of their queriesand outputs, typically independent of database size. In compari-son, the CryptDB family of constructions do not execute all thecomputational operations fully at the server, but instead requirethe clients to download certain intermediate results, decrypt themand perform the remaining computation. Similarly, the CFE schemeof [42] requires clients to evaluate a garbled circuit on the entiredataset. While multi client SSE has more lightweight clients thanthe above, the clients still do work proportional to the size of thefiltered data.• Deployability: We allow clients to join the system dynam-

ically – indeed, clients and data owners can be oblivious of each


SEDCED

map &merge-map

OPRF-basedSFE-based

sFEDSection II

(see Figure 2)

FED

Section IV (see Figure 3)

Section IV-D Section IV-B(see Figure 4)

sCED

Value RetrievalSummationSummation (alt)General

Section III-A sSED

Section III-B

Section IV-C

OnionSecret-Sharing

Section IV-A

sFEDD C

A

S

Z Q, f

f(Q[Z])

sCED

D CSA

X

Q

f(δT[X])

sSED

W

f

Q[W ]

T

Z Q, f

f(Q[Z])

CED

CSA

Xm

Q

f(δT[X])

SED

Wm

f

Q[W ]

T

Q, f

f(Q[Z])FED C

A

S

Z1

Q, f

f(Q[Z])

Dm

D1

Zm

..

. … DmD1ZmZ1

X1

W1

A S

QW1 Wm

…

Q[W ]

C

W Q

K"

Q[W ]map

SED

DmD1

K#

merge-mapFigure 1: The FED functionality. The dotted lines indicate leakage

from functionality. Note that we do not allow any leakage to thedata owners or the clients.

others’ identity (or even number) in contrast with prior construc-tions. However, we do rely on the availability of two non-colludingservers, which, as mentioned earlier, is necessary, and in our opin-ion, does not pose a significant barrier to deployability. Please seeAppendix B for a detailed comparison of our work with prior work.

2 FED FRAMEWORKWe use the notion of an ideal functionality [25] to specify our re-quirements from an FED system. An ideal functionality is an (imag-inary) trusted entity which interacts with all the players in thesystem, carrying out various commands from them. A scheme isconsidered secure if it carries out the same tasks as this trustedentity, with the same secrecy and correctness guarantees.

FED is formulated as a two stage functionality, involving aninitialization stage and a query stage. Figure 1 depicts the FEDfunctionality schematically. The parties involved are as follows:• Data-owners Di (for i = 1, · · · ,m, say) who are online onlyduring the initialization phase. Each data-owner Di has an inputZi ⊆ W ×X, where for each (w, x) ∈ Zi ,w form the searchableattributes and x the computable values. Z =

⋃i Zi denotes the

multi-set union of their inputs.• Storage Server S, which is the only party with a large storageafter the initialization stage.• Auxiliary Server A, which is assumed not to collude with S.Such models have been used in the SSE literature [30, 45, 47, 50]and implicitly in CFE [42].• Clients which appear individually during the query phase. Aclient C has input a query of the form (Q, f ) where Q :W →

{0, 1} is a search query on the attributes, and f is a computationfunction on a multi-set of values. It receives in return f (Q[Z ])where Q[Z ] denotes the multi-set consisting of elements in X,with the multiplicity of x being the total multiplicity of elementsof the form (w, x) in Z , but restricted to w that are selected byQ ; i.e., µQ [Z ](x) =

∑w ∈W:Q (w )=1 µZ (w, x), where µS (y) denotes

the multiplicity of an element y in the multi-set S.

Keyword Search Queries: The major search query families thathave received attention in the searchable encryption literature – andalso of interest to this work – are “keyword queries.”1 A keywordquery is either a predicate about the presence of a single keyword1We remark that the concept of searchable encryption has been generalized to moreexpressive forms of search, such as pattern matching, range queries and searchingover structured data [15, 16, 20, 37, 54, 56]. While our SED constructions do not focus

in a record (document), or a boolean formula over such predicates.In terms of the notation above, the searchable attribute for eachrecord is a set of keywords, w ⊆ K where K is a given keywordspace. That is,W = P(K), the power set ofK . A basic search querycould be a keyword occurrence query of the form Qτ , for τ ∈ K ,defined as Qτ (w) = 1 iff τ ∈ w . A more complex search querycan be specified as a boolean formula over several such keywordoccurrence predicates.

Composite Queries: We shall sometimes allow Q and f to bemore general than presented above. Specifically, we allow Q =(Q1, · · · ,Qd ), where eachQi :W → {0, 1} and f = (f0, f1, · · · , fd ),where for i > 0, fi are functions on multi-sets of values, and f0 is afunction on a d-tuple; we define

f (Q[Z ]) := f0(f1(Q1[Z ], · · · , fd (Qd [Z ])).

(This could be further generalized to recursively allow Qi and fi tohave the same general structure.)

For the functionality to be fully specified, we need to describethe leakage functions. Below, we shall present protocols along witha specification of the leakage to be included in the functionality, sothat the protocols securely realize the resulting functionality. As wedetail in Section 2.1, we use the model of Universally Composablesecurity, with a corruption model which forbids the storage serverS and the auxiliary server A from colluding with each other, andallows only the clients to be actively corrupt (the others beinghonest-but-curious).

2.1 Security ModelWe require provable security guarantees in the Universally Compos-able (UC) security framework [11] — i.e., in the real-ideal paradigm[25], with straight-line, black-box simulation in the presence of anarbitrary environment — but with a customized corruption model.In our corruption model all the parties can be passively corrupt(i.e., the honest-but-curious adversary model), but in addition theclients can be actively corrupt.2 Further, the storage server S and theauxiliary server A do not collude (i.e., cannot simultaneously becorrupt).

Our protocols will be modular: e.g., an FED scheme will be spec-ified in terms of two simpler functionalities, which themselves willbe implemented separately. To prove security, we can analyze theprotocol while retaining the simpler functionalities as ideal entities,thanks to the composability property of UC security (which holdsalso for mixed corruption models involving active and passive cor-ruptions, like the one we use) [11]. While the details of the analysisare routine, the important detail that falls out of this analysis that issignificant is the specification of the leakages, which we shall men-tion along with each of the protocols. More details on the securityanalysis is given in Appendix H.

on such search query families, our general framework applies to all these notions aswell.2While our protocols can be slightly modified to accommodate actively corrupt data-owners as well, we do not do so. This is because, even given the ideal FED functionality,one actively corrupt data-owner can choose to supply arbitrary data to the database,potentially invalidating the results from the entire database. To adequately handleactive corruption of the data-owners, the FED functionality should be augmentedwith the ability for the servers to enforce policies on the data being collected from thedata-owners.


3 SINGLE DATA-OWNER PROTOCOLSIn this section we present a single data-owner version of FED, de-noted by sFED, and also develop a secure protocol implementingthis functionality. This protocol in turn relies on two other new func-tionalities we introduce, namely, (single data-owner) SearchablyEncrypted Datastore (sSED) and Computably Encrypted Datastore(sCED). We begin by presenting these two functionalities.

3.1 Searchably Encrypted DatastoreRecall that in an FED or sFED scheme, a query has two components– a search query Q and a computation function f . The SearchablyEncrypted Datastore functionality (SED or sSED) has a similar struc-ture, but supports only the search query; all the records that matchthe search query are revealed to the storage server S. (Jumpingahead, the choice of S to be the party receiving the output ratherthan the client C is dictated by our plan to use this functionality inprotocols for FED and sFED.)

The functionality sSED is depicted in Figure 2: There is a singledata-owner D with inputW ⊆ W × I, where each element inWhas a unique identifier id ∈ I as its second coordinate; the outputthat S receives when a client C inputs Q is the set of identitiesQ[W ] ⊆ I.

In Section 3.5, we shall see that a multi-client version of Sym-metric Searchable Encryption (MC-SSE) from [31] can be used toconstruct an sSED scheme. The main limitations of MC-SSE com-pared to sSED are that (1) in the former the data-owner D remainsonline during the query phase (whereas in the latter D can be on-line only during the initialization phase), and (2) in the formerthe output is delivered to both S and C (whereas in the latter itmust be delivered only to S). In our construction in Section 3.5,we shall leverage the auxiliary server A to meet these additionalrequirements.

3.2 Computably Encrypted DatastoreThe second functionality we introduce – CED, or its single data-owner variant sCED– helps us securely carry out a computationon an already filtered data set. The complexity of this computationwill be related to the size of the filtered data set (rather than theentire contents of the data set).

In sCED, as shown in Figure 2, a single data-owner D (who staysonline only during the initialization phase) has an input in the formof X ⊆ I × X). Later, during the query phase, clients can computefunctions on a subset of data. More precisely, a client C can specifya function f from a pre-determined function family, and the storageserver S specifies a set T ⊆ I, and C receives f (δT [X ]) where wedefine δT (id, x) = x iff id ∈ T .

In Section 3.4, we present protocols for sCED, for various spe-cialized function families, as well as for a general function family.

3.3 sFED Protocol TemplateThe sFED protocol, which uses the above two functionalities, is asfollows.Dmaps its inputZ to a pair (W ,X ), whereW ⊆ W×I andX ⊆ I×X such that (w, x) 7→ ((w, id), (id, x))where id is randomlydrawn from (a sufficiently large set) I. Then, in the initializationphase of sFED, the parties D, S and A invoke the initialization phaseof sSED, and that of sCED (possibly in parallel). During the query

SEDCED

map &merge-map

OPRF-basedSFE-based

sFEDSection II

(see Figure 2)

FED



sCED


Section III-A sSED

Section III-B

Section IV-C

OnionSecret-Sharing

Section IV-A

sFEDD C

A

S

Z Q, f

f(Q[Z])

sCED

D CSA

X

Q

f(δT[X])

sSED

W

f

Q[W ]

T

Z Q, f

f(Q[Z])

CED

CSA

Xm

Q

f(δT[X])

SED

Wm

f

Q[W ]

T

Q, f

f(Q[Z])FED C

A

S

Z1

Q, f

f(Q[Z])

Dm

D1

Zm

..

. … DmD1ZmZ1

X1

W1

A S

QW1 Wm

…

Q[W ]

C

W Q

K"

Q[W ]map

SED

DmD1

K#

merge-map

Figure 2: sFED functionality (left) and the protocol template. Thedotted lines indicate leakage. The protocol template is in terms offunctionalities sSED and sCED (which are in turn instantiated us-ing separate protocols) and the parties do not communicate to eachother outside of (the instantiation of) these two functionalities.

phase of sFED, S, A and C first invoke the query phase of sSED, sothat S obtains T = Q[W ] as the output; then they invoke the queryphase of sCED and C obtains f (δT [X ]); note that δT [X ] = Q[Z ] ifthere are no collisionswhen elements are drawn fromI to construct(W ,X ) from Z . This protocol is summarized in Figure 2.

Leakage: The protocol above leaks, for every query (Q, f ) fromC, the set T = Q[W ] to S, and in addition provides S and A withthe leakage provided by the sSED and sCED functionalities (whichdepends on how they are instantiated). Note that since D choosesids at random, leaking T amounts to only leaking its pattern overmultiple queries: specifically, T1, · · · ,Tn contains only the informa-tion provided by the intersection sizes of various combinations ofthese sets. Formally, this leakage is given by

pattern(T1, · · · ,Tn ) := {��⋂i ∈S

Ti��}S ⊆[n]. (1)

We need to instantiate sSED and sCED protocols for the ap-propriate search query family and computation function family,respectively. For search queries, we shall focus on keyword searchesas mentioned in Section 2, allowing boolean formulas over key-word occurrence predicates. As for computation function families,we give a sCED construction for general function family; we alsopresent more efficient protocols for a few special functions, likesummation and value retrieval (i.e., returning the entire selecteddata without any further computation).

3.4 sCED ProtocolsWe present protocols for a few different computation function fam-ilies. We remark that sCED somewhat resembles a primitive calledControlled Functional Encryption [42], and our sCED protocolsinvolve similar ideas as in that work.

In each of the protocols below, D has an input X ⊆ I × Xduring the initialization phase. It will be convenient to define theset J ⊆ I as J = {id|∃x s.t. (id, x) ∈ X }. During each query, Shas an input T ⊆ J and C has an input f from the computationfunction family. Below, all our constructions use a PRF F .

Value Retrieval: This is the functionality associated with standardSSE, where the selected values, or documents, are retrieved withoutany further computation on them. There is a single function in thecorresponding computation function family, given by f (δT [X ]) =δT [X ]. When the client C and the data owner D are the same


party (as is the case in the simplest version of SSE), this can beimplemented in a straightforward fashion using a PRF. Below wegive a simple scheme which relies on A to extend this to a settingwith (multiple) clients who do not communicate directly with D.

• Initialization Phase: D picks a PRF key K , and defines βid :=xid ⊕ FK (id) and sends {(id, βid)}id∈X to S and K to A, who storethem.• Computation Phase: S sends T (randomly permuted) and afresh PRF keyK1 to A; it also sends {βid⊕FK1 (id) | id ∈ T } (underthe same permutation) to C. A sends {FK (id) ⊕ FK1 (id) | id ∈ T }to C. C outputs {ai ⊕bi }i , where {ai }i and {b}i are the messagesit received from S and A (in the same order).• Leakage: On initialization, the ID-set J is leaked to S. On eachquery, T is leaked to A.

Recall that, in the overall sFED protocol template, ids will bechosen randomly. Hence leaking J amounts to leaking only its size|X | (the data can be padded with dummy entries so that instead of|X |, only an upper bound on it is leaked), and leaking T amountsto only leaking its pattern over multiple queries (see Equation 1).

We briefly sketch the elements in the protocol that help it achievesecurity. In the initialization phase D secret-shares its data betweenthe two non-colluding servers, so that an adversary corruptingeither one learns no information about the data. In the computa-tion phase, C receives freshly randomized secret-shares (using thekey K1) of the answer to its query, with the elements randomlypermuted. This is because, if C does not collude with one of theservers it should receive no information other than the multi-set ofretrieved values, f (δT [X ]). In particular, it may not learn whetheran id selected by one query gets selected again under another query.The permutation and fresh secret-sharing ensures that its view canbe completely simulated just based on the multi-set of retrievedvalues, f (δT [X ]) = {xid |id ∈ T }. Note that if C colludes with one ofthe servers, this rerandomization has no effect, but also, in that case,it is allowed to learn T and there is no need for rerandomization.

Summation: The summation functionality f (S) =∑x ∈S x is a

natural and greatly useful functionality (where the summation is ina given abelian group which the domain of values is identified with).The following simple and efficient protocol yields a sCED schemefor summation, with A learning only the size and the “pattern”information about the input of S.

• Initialization Phase: D picks a PRF key K , and defines βid :=xid + FK (id) and sends {(id, βid)}id to S and K to A, who storethem.• Computation Phase: S defines the set R := {id | id ∈ T } and arandom value ρ; it sends (ρ,R) to A and γ := ρ +

∑id∈T βid to C.

A sends δ := ρ +∑α ∈R FK (α) to C. C outputs γ − δ .

• Leakage: On initialization, the ID-set J is leaked to S. On eachquery, T is leaked to A.

This protocol is a natural extension of the Value Retrieval schemeabove, with the same initialization phase. The client C receives afresh additive secret-sharing of the single output value it seeks. Thesecurity argument is similar to before; in particular, if C does notcollude with either server, its view can be completely simulatedfrom its output without any leakage.

In Appendix D we present an alternate protocol for summation,using additive homomorphic encryption, which has lower commu-nication complexity and also avoids the leakage of the filtered setT to A.

Value Retrieval and Summation for Vector Values:The aboveprotocols readily extend to the setting when each value x is a vector(x1, · · · , xm ), and the function f acts on a subset of attributes. Cwill send the coordinates of interest to A, but not to S, or to boththe servers (for added efficiency). The protocol could be seen asparallel executions of the original protocol, one for each coordinateof interest; if S does not have the information regarding the coor-dinates of interest, the execution is carried out for all coordinates,but at the last step, A sends to C only the shares for coordinatesincluded in f .

General Functions: Next we present a sCED scheme for generalfunctions. Our scheme could be seen as an adaptation of the CFEscheme of [42] for general functions, despite some important dif-ferences between the two models.3

A client C who wishes to evaluate a function f sends a circuitrepresentation of f to A. The inputs to this circuit are the values{x id}id∈T which none of the participants in the query phase (C,A nor S) knows. At a high-level, the idea is that A will constructa garbled circuit for f and sends it to S. For each input bit forthis circuit, there are two labels, which will both be encryptedby A using keys that are derived from a master key that the data-owner D gives it (as described below). All these encrypted labelsare sent along with the garbled circuit. To evaluate the garbledcircuit, S needs to know how to decrypt one out of the two labelscorresponding to each input position. To enable the evaluation,D would have provided S with the decryption key for the labelscorresponding to each bit of x id, for each id (during the initializationphase). A detailed description of this scheme is given in Figure 7in the Appendix. The leakage function for this construction is asfollows:

Leakage: On initialization, the ID-set J is leaked to S. On eachquery, T and f are leaked to A, and the circuit structure of f (asrevealed by the garbled circuit) is leaked to S.

The leakage can be further reduced by allowing C to specify anypart of the function as a private input to the circuit computing f(rather than being hardwired into the circuit). For each of the wirescorresponding to this private input, the two labels will be providedby C to A, and C will send one of these two labels to S (so thatneither S nor A by itself learns this input).

Composite Queries: Recall that a composite query consists ofQ = (Q1, · · · ,Qd ) and f = (f0, f1, · · · , fd ) such that f (Q[Z ]) :=f0(f1(Q1[Z ]), · · · , fd (Qd [Z ])). The sSED protocol for non-compositequeries directly generalizes to composite queries, by simply run-ning d instances of the original sSED functionality to let S learnTi = Qi [W ] for each i . But we need to adapt the sCED protocol toavoid revealing each fi (Qi [Z ]) to C.

Towards adapting the above non-composite queries sCED pro-tocols, note that in all of them, the last step has S and A sending

3In particular, the model of CFE conflates the client C and the storage server S; also, itdoes not allow the data-owner D to directly communicate with the auxiliary sever A(resulting in the use of public-key encryption in [42], which we avoid).


secret shares for the output to C (in the case of value retrieval andsummation protocols this is an additive sharing over an appropriategroup, and in case of general functions, for each output bit, oneshare (from A) is a map from a pair of labels to {0, 1}, and the othershare (from S) is a label). The reconstruction operation to calculatethe final output from these secret shares is simple and efficient. Weuse this common theme in our protocols to allow calculation ofcomposite queries.

One could modify these protocols as follows: Instead of S and Asending the shares toC, they carry out a secure 2-party computationof f0 with each input being secret-shared as above. This can beimplemented for a general f0 (e.g., finding the maximum) usingYao’s garbled circuits and oblivious transfer, or for special functionslike linear combinations using simple protocols (e.g., if f0 is addition,and the inputs are additively secret-shared in the same group, Sand A each can simply add the shares they have locally, and sendthe result to C). In case f0 is evaluated using Yao’s garbled circuits,the circuit structure used to evaluate f0 is leaked to S.

3.5 sSED ProtocolsIn this section, we discuss how to instantiate the building block ofsearchable encryption sSED used in our sFED protocols by adaptingconstructions of symmetric searchable encryption (SSE) from the lit-erature (see, e.g., [13] and references therein). Though the standardsetting for SSE is described in terms of keyword searches in a collec-tion of documents, it extends to searches over a table in a relationaldatabase as follows: Each row in the table is interpreted as a docu-ment, consisting of “keywords” of the form (attribute, value), onefor each attribute (column) in the table. Then a search query that isspecified as a formula over attribute-value equality predicates4 canbe encoded as a formula over keyword predicates.

In SSE, there are only two parties: a client and a server. To storeand search a databasewith an SSE scheme, the client (who is the dataowner) uploads an encrypted database to the server. Later, the clientcan carry out search queries on this database by interacting withthe server, while providing the server with minimal information(patterns in the queries and responses). Jarecki et al. [31] extendedSSE to a multi-client setting in which the data owner is separatefrom (and does not trust) the clients. Here, the data owner mustremain online when the clients query the server, as the untrustedserver should not be in a position to answer queries by itself.

Recall that in sSED, we further separate the roles of the dataowner and the query-phase assistant, by introducing an untrustedauxiliary server. This allows the data owner to be present only at theinitial phase when the database is constructed. More importantly, inthe multiple data owner setting (in which no one data owner can betrusted with access to all the data), it is crucial to avoid relying onany one data owner to control clients’ access to the entire database.Another difference between sSED and SSE is that the latter revealsthe search outcome (the IDs of the rows that match the search)to the client; in sSED, we seek to reveal this to S, but it cannotbe revealed to the client (as the clients are not provided with any

4Note that this encoding does not permit range queries, except for small ranges thatcan be written as a disjunction over equality predicates. However, SSE constructionsthat support range queries do exist [20, 37, 54], and the high-level approach of adaptingSSE schemes into sSED schemes given here applies to such constructions as well.

leakage). Finally, in sSED, we seek to handle active corruption ofclients.

In Appendix E, we sketch how to adapt the MC-OXT protocol of[31] to account for the above requirements of sSED. As a lightweightmodification, we can simply let the auxiliary server A play the roleof (1) the data owner during the query phase, and (2) the client.This incurs some leakage to A, namely the search queries and thesearch outcome. Note that when used in the sFED protocol template(Figure 2), since sSED is instantiated with random IDs, only thepattern of the search outcomes, as in Equation 1, rather than thesearch outcomes themselves are revealed to A, as well as to S. Alsosince many of the sCED protocols already leak this information toA, this provides an appropriate level of security for sSED protocolsto be used in the sFED protocol template.

We also present multiple security-efficiency tradeoffs relative tothe above protocol:• We can avoid leaking the search outcome to A, but only leak anupper bound on the size of the outcome, by using an additivehomomorphic encryption instead of plain public-key encryptionused in the SSE scheme of [31] to encrypt the ids. This solu-tion could be seen as an instance of onion secret-sharing (seeSection 4.1) that supports reusability.• We can retain the original efficiency of the SSE scheme, butslightly increase the leakage to S (only if the keyword queryinvolves a boolean condition over multiple keyword predicates)and avoid leaking the search outcome to A by simply omittingthe above mentioned layer of encryption.• Further, if we start with the OSPIR-OXT protocol in [31] thathandles actively corrupt clients, then we can avoid leaking thesearch queries to A.

4 FED PROTOCOLSOur FED protocol template is identical to that of the sFED protocol,except that the functionalities sSED and sCED are replaced by theanalogous multiple data-owner versions, SED and CED (see Fig-ure 6). In particular, in our FED protocol each data-owner Di mapsits input Zi to a pair (Wi ,Xi ), whereWi ⊆ W ×I and Xi ⊆ I ×Xsuch that (w, x) 7→ ((w, id), (id, x)) where id is randomly drawnfrom (a sufficiently large set) I.5 Then the data-owners simplyinvoke the protocols SED and CED (instead of sSED and sCED, inthe template for our sFED protocol). The main challenge then isin realizing the functionalities SED and CED, for reasonable leak-age functions. Below, we present our protocols for realizing thesefunctionalities.

4.1 Onion Secret-SharingIn going from single data-owner schemes to multi data-ownerschemes, we seek to make the collection of data-owners behave likea single entity (without interacting with each other), so that theycan communicate their collective data to the two servers in the formin which the underlying single data-owner scheme communicatesit. Note that from this collective data, neither server should be ableto link the records that are selected by a search query back to the

5I should be large enough so that we may assume that each (honest) data-owner willuse a unique id for each record (w , x ), disjoint from the set of IDs used by the others,except with negligible probability.


individual data owners from whom it originates. In principle, thisproblem can be solved generically using secure multi-party compu-tation techniques. However, for efficiency reasons, we develop asuite of techniques under the name of onion secret-sharing, thatcarefully combines secret-sharing and public-key encryption toachieve this.

Onion secret-sharing is a non-trivial generalization of the tra-ditional mix-nets [17]. In a mix-net, a set of senders D1, · · · ,Dmwant to send their messages M1, · · · ,Mm to a server S, with thehelp of an auxiliary server A (who does not collude with S), so thatneither S nor A learns the association between the messages andthe senders (except for the senders whom they collude with). Thisis easily achieved as follows: each Di sends JMi KPK S to A wherePKS is the public-key of S for a semantically secure public-key en-cryption scheme, and the notation JMKPK denotes encryption ofM using a public-key PK . A collects all such ciphertexts, sorts themlexicographically (or randomly permutes them), and forwards themto S; S decrypts them to obtain the multiset {M1, · · · ,Mm }.

Now consider the following task. Each sender Di wants to shareits messageMi between two servers S and A; that is, it setsMi =

σi⊕ρi , andwants to sendσi to S and ρi toA. While the senders wanttheir messages to get randomly permuted, the association betweenσi and ρi needs to be retained. Onion secret-sharing provides asolution to this problem, as follows:

Each Di sends J(ρi , ζi )KPK S where ζi is of the form Jσi KPKA . Amixes these ciphertexts and forwards them to S, who decrypts themto recover pairs of the form (ρi , ζi ). Now, S reshuffles (or sorts) thesepairs, stores ρi and sends ζi (in the new order); A recovers σi fromζi (in the same order as ρi are maintained by S).

This can be taken further to incorporate additional functionality.As an example of relevance to us, suppose A wants to add a short,private tag to the messages being secret-shared so that the tagpersists even after random permutation. For the data-items whichhave the same tag, A should not be able to tell which one originatedfrom which Di , and S should obtain no information about thetags (this proves useful in the protocol in Section 4.3.2). If A addsencrypted tags to the data items, then while permuting the dataitems, S would also need to rerandomize the ciphertexts holdingthe tags. We present an alternate approach, which does not requireadditional functionality from the public-key encryption scheme,but instead augments onion secret-sharing with extra functionality:

• Di creates a 3-way additive secret sharing of 0 (the all 0’s string),as αi ⊕ βi ⊕ γi = 0, and sends (αi , Jβi , ρi , Jγi ,σi KPKAKPK S ) to A.• A assigns tags τi for each of them, and sends (in sorted order)(τi ⊕ αi , Jβi , ρi , Jγi ,σi KPKAKPK S ) to S.• S sends (τi ⊕ αi ⊕ βi , Jγi ,σi KPKA ) to A, in sorted order; it storesρi (in the same sorted order).• A recovers (τi ⊕ αi ⊕ βi ⊕ γi ,σi ) = (τi ,σi ).

This allows S and A to receive all the shares (in the same permutedorder); S learns nothing about the tags; A cannot associate whichshares originated from which Di , except for what is revealed bythe tag. (Even if S or A collude with some Di , the unlinkability isretained for the remaining Di .)

In Section 4.3.2, we use a variant of the above scheme to let A tagentries in various lists with a pseudonym (multiple lists may get

the same pseudonym), before the lists are unpacked and shuffledagain, destroying the linkage to the lists, but retaining the tags. 6

4.2 Protocol Template for SEDWe describe a general protocol template to realize the SED function-ality, given access to the sSED functionality. The high-level plan isto let A create a merged database so that it can play the role of Dfor sSED. However, since we require privacy against A, the mergeddatabase should appear shorn of all information (except statisticsthat we are willing to leak). Hence, during the initialization phase,we not only merge the databases, but also replace the keywordswith pseudonyms and keep other associated data encrypted. Weuse pseudonyms for keywords (rather than encryptions) to sup-port queries: During the query phase, the actual keywords willbe mapped to these pseudonyms and revealed to A. These twotasks at the initialization and query phases are formulated as twosub-functionalities – merge-map and map. Before specifying ourtemplate, we describe these new functionalities below.

• merge-map takes as inputsWi from each Di ; it generates a pair“mapping key” Kmap = (KS,KA) (independent of its inputs), andcreates a “merged-and-mapped” input set, W . Merging simplyrefers to computing the (multi-set) union W =

⋃iWi ; map-

ping computes the multi-set W = {(w, id)|(w, id) ∈ W , w =

MKmap (w), and id = MKmap (id)}, where the mapping functionM

is to be specified when instantiating this template.7 It outputs KSto S and (W ,KA) to A. (Since KA will be stored by A, we requirethat it be short, independent of the data size.)• map takes KS and KA as inputs from S and A respectively, anda query Q from a client C; then it outputs a new query Q to C.We shall require that there is a decoding function D such thatQ[W ] = {DKS (id)|id ∈ Q(W )}, where W is as described above.

These functionalities may specify leakages to S and/or A, but notto the data-owners or clients. Note that since map gives an outputQ to C, we shall require that it can be simulated from Q .

Given these functionalities, the template for the SED protocol isshown in Figure 3. In this protocol, sSED is invoked with A playingthe role of the data-owner as well. The invocation ofmerge-map ispart of the initialization phase and that of map is part of the queryphase. S uses DKS to compute its output Q[W ] from Q(W ).

Note that A does not store any additional information betweenthe two phases (other than what the implementation of sSED re-quires).

Leakage: Since this protocol delivers the merge-mapped data W toA, it leaks certain statistical information about the merged dataW(not individual datasetsWi ) to A. The exact nature of the leakagedepends on the mapping-function M .8In addition, leakages from

6For simplicity, here we consider the tagging to be arbitrary, whereas in Section 4.3.2,it is done based on equality checks. Here we allow A to add tags while it knows thelink between data-items and Di ; in our application, this link is broken by an extraround of mixing.7For notational simplicity, we have specified the mapping using a single functionMoverW ∪ I. Typically, this function acts differently onW and I, using differentparts of the key Kmap .8Such leakage could be avoided by relying on secure two-party computation of acertain function between A and S during initialization, but with high communicationcosts.


merge-map, map and sSED (the latter on the merged dataset) willbe included in the leakage of this protocol.

SEDCED

map &merge-map

‣ OPRF-based‣ SFE-based

sFEDSection 3.1

(see Figure 2)

FED

Section 4 (see Figure 3)

Section 4.4 Section 4.2(see Figure 4)

sCED

‣ Value Retrieval‣ Summation‣ Summation (alt)‣ General

Section 3.2 sSED

Section 3.3

Section 4.3

OnionSecret-Sharing

Section 4.1

A S

QW1 Wm

…

Q[W ]

C

W Q

K"

Q[W ]map

sSED

DmD1

K#

merge-map

Figure 3: SED protocol template using three functionalities sSED,merge-map and map. Although not shown, all three functionalitiesused may specify leakage to S and A. In accessing sSED, A plays therole of both the (single) data-owner and the auxiliary server.

4.3 Instantiating SED Protocol TemplateTo instantiate the above template, we need protocols formerge-mapand map (in addition to a protocol for the single data-owner sSEDfunctionality). Below we present these two protocols for the caseof keyword search queries (see Section 2). For such queries animplementation of sSEDwas already presented in Section 3.5. Recallthat in this setting the searchable attribute for each record is a setof keywords,w ⊆ K . For concreteness, we shall require that everykeyword is represented using a bit string of the same length, sothat K ⊆ {0, 1}λ .

We give two constructions, with different efficiency-securitytrade-offs. Recall that each data-ownerDi has an inputWi ⊆ W×I.In both the solutions, Di shall use a representation ofWi as a setWi ⊆ K × I such that (w, id) ∈ Wi iff w = {τ |(τ , id) ∈ Wi }. Inthe two solutions below, M maps the keywords differently; butin both the solutions, an identity id that occurs in Wi is mappedto ζ idi ← JidKPK S (an encryption of id under S’s public-key). Inone scheme, we use an oblivious pseudorandom function (OPRF)protocol to calculateM , while the other relies on a secure functionevaluation for equality. Recall that a PRF FK (τ ) is called oblivious[41] if there is a two-party protocol in which the first party inputsτ , the second inputs K , the first learns the value of FK (τ ) and thesecond learns nothing. While an OPRF protocol is more complexthan secure equality evaluation, access to OPRF allows us to con-struct a simpler protocol. On the other hand, suitably efficient OPRFprotocols are known only in the heuristic Random Oracle model[31], whereas very efficient secure evaluation of equality is possiblein the standard model.

4.3.1 Construction Using Oblivious PRF. In this solution, the map-ping key KA will be empty, and KS will consist of a PRF key K ,in addition to a secret-key for a PKE scheme, SKS. M maps eachkeyword τ using a pseudorandom function with the PRF key K .This will be implemented using an oblivious PRF execution withS, both in merge-map and map. That is, for w ⊆ K we haveMKmap (w) = {FK (τ )|τ ∈ w}, where F is a PRF.

The pair of protocols for the functionalities merge-map andmap are shown in Figure 4. These protocols use two a priori bounds(or, includes such bounds as part of leakage) for each data-owner,Ni and Li such that |Wi | ≤ Ni and |Ki | ≤ Li , where Ki ={τ |∃id s.t. (τ , id) ∈ Wi } is the set of keywords in Di ’s data (re-call that Wi is a representation of Di ’s data as keyword-identitypairs).

We point out the need for OPRF evaluations. If we allowed S tosimply give the key K to every data-owner to carry out the PRFevaluations locally, then, if A colludes with even one data-ownerDi , it will be able to learn the actual keywords in the entire data-set.As described below, using an OPRF considerable improves on this.

Leakage: S learns the bounds Ni and Li for each i , and A learns∑i Ni ; from the map phase, S learns the number of keywords in

the query.Also, we specify what the outputW to A reveals (being an output,

this is not “leakage” in this protocol, but manifests as leakage in theSED protocol that uses this functionality).W reveals an anonymizedversion of the keyword-identity incidence matrix of the mergeddata,9where the actual labels of the rows and columns (i.e., thekeywords and the ids) are absent. If A colludes with some data-owners, it learns the actual keyword labels of all the correspondingkeywords held by the colluding data-owners.

OPRF Protocol. Since we require security against actively corruptclients, we need a UC secure protocol for OPRF that is secure againstactive corruption of the receiver (and passive corruption of the partywith the key). We present such a protocol in the random oraclemodel by modifying the OPRF protocol of [31].

We use a group of prime orderp in which wemake the DecisionalDiffie-Hellman assumption. The receiver, whose input is x , samplesr ∈ [p] uniformly and sends a = H0(x)r to the key-holder, whereH0 is a random oracle that maps into the group. The key-holderreplies with b = aK , where K ∈ [p] is the key. The receiver outputsH1(b1/r ), where H1 is another random oracle.10

4.3.2 Construction Using Secure Equality Checks. Our next proto-col avoids OPRF evaluations. The idea here is to allow each dataowner to send secret-shared keywords between S and A, and relyon secure function evaluation (SFE) to associate the keywords withpseudonymous handles, so that the same keyword (from differentdata owners) gets assigned the same handle (but beyond that, thehandles are uninformative). Further, neither server should be ableto link the keyword shares with the data owner from which it origi-nated, necessitating a shuffle of the outputs. Due to the complexityof the above task, a standard application of SFE will be very expen-sive. Instead, we present a much more efficient protocol that relieson secure evaluation only for equality checks, with the more com-plex computations carried out locally by the servers; as a trade-off,we shall incur leakage similar to that of the OPRF-based protocolabove. Below, we sketch the ideas behind the protocol, with theformal description in Figure 9.

9This is a 0-1 matrix with 1 in the position indexed by (τ , id) iff (τ , id) ∈⋃i Wi . The

rows and columns may be considered to be ordered randomly.10The original protocol in [31] did not include the use of H1 . Using H1 allows us toobtain UC security against actively corrupt receivers.


merge-map:

• Keys. S generates a keypair for PKE, (PKS, SKS), and similarly A generates (PKA, SKA). The keys PKS and PKA are published. S also generates a PRF key K .• Oblivious Mapping. Each data-owner, for each τ ∈ Ki , Di engages in an Oblivious PRF (OPRF) evaluation protocol with S, in which Di inputs τ , S inputsK and Di receives as output τ := FK (τ ). Di carries out Li − |Ki | more OPRF executions (with an arbitrary τ ∈ Ki as its input) with S.• Shuffling. Each data-owner Di computes ζ id

i ← JidKPKS , for each id such that there is a pair of the form (τ , id) ∈ Wi . All the data-owners use the help of Sto route the set of all pairs (τ , ζ id

i ) to A, as follows:– Each Di , for each (τ , id) computes ξ idi ,τ ← J(τ , ζ id

i )KPKA , and sends it to S. Further Di computes Ni − |Wi | more ciphertexts of the form J⊥KPKA .– S collects all such ciphertexts (Ni of them from Di , for all i ), and lexicographically sorts them (or randomly permutes them). The resulting list is sent to A.– A decrypts each item in the received list, and discards elements of the form ⊥ to obtain a set consisting of pairs of the form (τ , id), where id = ζ id

i . W isdefined as the set of pairs of the form (w , id) where w contains all τ values for each id.

• Outputs. A’s outputs are W and an empty KA; S’s output is KS = (SKS, K ).

map: A client C has an input Q which is specified using one or more keywords. For each keyword τ appearing in Q , C engages in an OPRF execution with S,who inputs the key K that is part of KS. The resulting query is output as τ .

Figure 4: Protocols implementing merge-map and map functionalities for instantiating the template for SED protocols, based on OPRF.

Consider two data owners who have shared keywords τ1 andτ2 among A and S, so that A has (α1,α2) and S has (β1, β2), whereτ1 = α1 ⊕ β1 and τ2 = α2 ⊕ β2. Note that τ1 = τ2 ⇔ α1 ⊕ β1 =α2 ⊕ β2 ⇔ α1 ⊕ α2 = β1 ⊕ β2. So, for A to check if τ1 = τ2, Aand S can locally compute α1 ⊕ α2 and β1 ⊕ β2 and use a secureevaluation of the equality function to compare them (with onlyA learning the result). By comparing all pairs of keywords in thismanner, A can identify items with identical keywords and assignthem pseudonyms (e.g., small integers), while holding only oneshare of the keyword.

We note that using securely pre-computed correlations, secureevaluation of the equality function can be carried out very effi-ciently, simply by exchanging a pair of values in a sufficiently largefield. Please refer to Appendix F for a description of the protocol.

The data shared by the data owners to the servers includes notjust keywords, but also the records (document identifiers) associ-ated with each keyword. To minimize the number of comparisonsneeded, each data owner packs all the records corresponding toa keyword τ into a single list that is secret-shared along with thekeyword (rather than share separate (τ , id) pairs). However, afterhandles have been assigned to the keywords, such lists should beunpacked and shuffled to erase the link between multiple recordsthat came from a single list (and hence the same data owner). Eachentry in each list can be tagged by A using the keyword handleassigned to the list, but then the entries need to be shuffled while re-taining the tag itself. How this can be accomplished using the onionsecret-sharing technique was already discussed in Section 4.1.

The full protocol involves a few other details: the initial secretsharing of the keywords are delivered to the two servers usingonion secret-sharing; the number of entries in individual lists foreach keyword, and the total number of lists from each data ownerare padded up; the dummy entries in a list are culled jointly by thetwo servers. We refer the reader to the detailed description of theprotocol in Figure 9.

We note that this construction uses O((∑i Li )

2) secure equal-ity checks, where Li is an upper bound on the total number ofkeywords in Di ’s data. In comparison, the previous constructionneeded O(

∑i Li ) OPRF evaluations. Even though secure equality

checks have a low online cost, when the number of keywords in-volved is large, the quadratic complexity can offset this advantage.Hence, in Appendix F, we provide a mechanism to improve the

efficiency of this construction, using a partition of the keywordspace into bins.Leakage: The leakage is identical to that in the OPRF based con-struction above, with the following additions: During the merge-map phase S learns an upper bound L ≥ |

⋃i Ki |; from the map

phase, A learns the pattern of the keywords in a query. In Appen-dix G, we describe how the leakage may be further reduced.

4.4 CED ProtocolsUpgrading an sCED protocol to a CED protocol is simpler than thesSED to CED transformation: Since there are no indices to be com-puted as part of a sCED protocol, we do not need to reconstruct amerge-mapped dataset. Indeed, instead of using sCED functionalityas a blackbox, we can directly modify the initialization phase ofthe sCED protocol. Since the sCED protocols tend to keep all thedata records secret-shared between S and A, we need only have thedata owners route their data shares to the two servers using onionsecret-sharing. We show how the sCED schemes in Section 3.4 aremodified in this manner.

Value Retrieval and Summation: Below we describe how thesCED scheme in Section 3.4 for Value Retrieval is adapted into aCED scheme. Adaptation of the Summation scheme is similar.

In the sCED scheme for Value Retrieval, every value xid wassecret-shared between A and S as xid = αid ⊕ βid, where αid =FK (id), with K being a PRF key that the data-owner and A shared(and S did not have access to). However, this is not viable in a multidata-owner setting, because if S colludes with any one data-owner itwill learn this key; alternatively, if each data-owner uses a differentkey, this would let A collect the different data-items as comingfrom different data-owners, revealing additional statistics aboutthat data as well as about the answer to queries, beyond what amerged data-set reveals. To prevent such leakage, we delink the keyto the data owner and link it to each record, i.e. we generate a newkey for each record (xid) that Di sends. We present the modifiedprotocol below.• Modified Initialization Phase:(1) S generates PKE key-pairs (PKS, SKS), and publishes the public-

keys.(2) For each (id, x) ∈ Xi , Di picks a PRF key Kid and calculates

βid = FKid (id) ⊕ xid (where F is a PRF with appropriate inputand output lengths). Di sends {JδidKPKA }id to S, where δid =


(JKidKPK S , (id, βid)). (This is possibly padded with randomentries, where δid = ⊥).

(3) S collects all the data from different data owners, sorts themlexicographically and sends them to A.

(4) A receives the data, decrypts it to calculate {JKidKPK S , (id, βid)}id.It then picks a PRF key K . Let γid = βid ⊕ FK (id). It sends{(JKidKPK S , (id,γid))}id to S.

(5) S, for each id, decrypts to calculate Kid. It defines θid ←γid ⊕ FKid (id). (Note that θid = xid ⊕ FK (id)).

(6) S stores {(id, θid)}id while A stores K .The computation phase proceeds exactly as before.

The modified initialization phase leaks (an upper bound on) |Xi |for each data-owner to S and the combined ID-setJ to both A and S.(We remark that, in the FED protocol which uses this functionality,the IDs are chosen randomly and hence leaking the ID-set J onlyamounts to leaking the size |X | of the combined data set.)

General Functions: The sCED scheme for general functions fromSection 3.4 (based on the CFE scheme of [42]) can be easily adaptedto the multi data-owner setting. Incidentally, the original CFEscheme of [42] was already designed for multiple data-owners,but without any anonymization requirement on the inputs. Theonly essential modification needed in this case is to route the datafrom the data-owners to S anonymously (with the help of A), asdescribed below. Also, in the modified scheme the data-owners donot provide keys to A to facilitate computation of the keys used toencrypt the labels for each id, but instead the requisite keys willbe kept encrypted with S. The modifications are shown in Figure 7(below the single data-owner version). There is no additional leak-age to S compared to the single data-owner version, and A learnsan upper bound on the size of the individual ID sets |Ji | for eachdata-owner Di .

5 IMPLEMENTATION AND EXPERIMENTALRESULTS

In this section, we report details of our implementation and theexperiments we performed to demonstrate the scalability and ap-plicability of our protocols. The implementation used C++11 withSTL support. Various opensource cryptographic libraries and im-plementations (including OpenSSL, libPaillier, NTL, JustGarbledand Obliv-C) were used to implement the various components inour protocol, as detailed in Appendix I. The MC-OXT protocol andthe modified OPRF scheme of Jarecki et al. [31] (see Section 4.3.1)were reimplemented for use in our SED protocols.

To test the feasibility of using our sFED and FED schemes forrealistically large-scale computations, we measured their perfor-mance on tasks representative of Genome Wide Association Studies(GWAS). The specific functions we choose were adapted from theiDash competition [3, 34] for secure outsourcing of genomic data.Each record in a GWAS database corresponds to an individual, withfields corresponding to demographic and phenotypic attributes(like sex, race, diseases, etc.), as well as genetic attributes. Thegenetic attributes of interest – called Single Nucleotide Polymor-phisms (SNPs) – describe, for each of several loci (i.e., positions) inthe genome, the “genotype” at that locus (typically, one of a smallnumber of possibilities). In a typical query, the demographic andphenotypic attributes are used for filtering (e.g., South Asian males

with type 2 diabetes [14, 35]), and statistics are computed over thegenetic information. In our experiments, the following statisticswere computed.

• Minor Allele Frequency (MAF): In what is called a biallelicsetting, each locus of interest has one of three genotypes – sayAA, BB or AB (corresponding to two “alleles” A and B, and twochromosomes). Given a set of N individuals (selected using asearch filter) and a locus of interest, the total allele counts aredefined as nA = 2nAA + nAB and nB = 2nBB + nAB , wherenAA, nBB and nAB are the number of individuals of the threegenotypes. Then, MAF for that locus is defined [34] as the realnumber

min(nA,nB )

nA + nB=min(nA, 2N − nA)

2N.

Computing MAF at a given locus can be implemented as a specialinstance of a composite query of the form f0(f1(Q[Z ]), f2(Q[Z ]),where Q is the demographic/phenotypic filter, and f1 computesN = |Q[Z ]|, f2 computes nA, and f0 the MAF itself (upto a finiteprecision) from nA and N . Note that if we encode the genotypesBB,AB,AA as 0, 1, 2 respectively, then nA is simply the sum ofthe encoded genotype values. Hence we shall be able to use oursummation (and alternate summation) sCED and CED protocols.• Chi-square analysis (ChiSq): To quantify the association be-tween an allele and a disease, often a chi-square (χ2) statisticdefined below is employed [18]. Two groups of individuals – caseand control groups – are selected using two search filters Q andQ ′ (with and without a disease). Defining (nA,nB ) and (n′A,n

′B )

as above, but for Q and Q ′ respectively, the χ2 square statistic[5, 34] can be formulated as

χ 2 =(N + N ′)(nAN ′ − n′AN )

2

NN ′(nA + n′A)((N + N′) − (nA + n′A))

where N = (nA +nB ) and N ′ = (n′A +n′B ). Computing this quantity

corresponds to a composite queryf0(f1(Q[Z ]), f2(Q[Z ]), f3(Q ′[Z ]), f4(Q ′[Z ]), where f1, f2, f3, f4 com-pute nA,N ,n′A,N

′ respectively, and f0 implements the aboveformula (up to finite precision).• Average Hamming Distance (Hamming): The Hamming dis-tance between 2 genome sequences is often used as a metricfor genome comparison [48, 59]. Here, we use the entire geno-type vector x for each individual (rather than the genotype at agiven locus). The function fx ∗ (Q[Z ]) =

∑x∈Q [Z ] ∆(x ,x ∗)|Q [Z ] | , where ∆

stands for the hamming distance of two strings. We implementthis using our general functions protocol.• Genome retrieval (GenomeRet): This is a retrieval task wherewe retrieve the genome data at some locus for the set of individ-uals selected by the search filter. We implement this using ourvalue retrieval protocol.

Dataset andQueries. We conducted our experiments on syntheticdata generated randomly. Our dataset sizes are typically between10000 − 100000 records and 200 SNPs, inspired by real-world ap-plications [29, 57, 58]. For the Hamming functionality, we used5000−25000 records and 200 SNPs. For searching on these databases,we carefully chose our parameters (search keywords) so that the


number of filtered records ranged between 600 − 3500 for the Ham-ming functionality and between 2000 − 12000 for other functionali-ties. All the reported values are averaged over the same 10 queriesto account for experimental variation.

Results. We performed experiments in both the single (sFED)and multiple data owner (FED) setting. For the FED setting, weconsider both protocols, i.e. based on oblivious PRF (FED OPRF)and secure function evaluation on equality function (FED SFE).Results are reported over two metrics - total communication costand gross computation time across all entities. The computationtime of any entity excludes time taken to read/write to disk andalso time taken to transfer data over the network. These metricsare contrasted between FED and SED and are reported for theinitialization and query phases. The relevant entities were simulatedusing TCP clients and servers running on localhost. All experimentswere performed on a Linux machine with 64 GB RAM and IntelCore i7-7700 CPU with processing frequency of 3.60 GHz. All thedatabases were pre-loaded into the RAM.

Our results are summarized in Figure 5. The initialization costsare plotted against the total number of input records in the initialdatabase, while the query costs are plotted against the numberof filtered records. In each case, we plot the cost of the full sFEDprotocol, as well as its sSED component (dashed lines), as the latteris a proxy for searchable encryption costs from the literature andserves as a baseline. In the initialization phase, the sFED protocolsrequire only a small constant overhead over the sSED component,whereas in the query phase the overhead is mostly negligible. Anexception to the latter trend is the case of the Average Hammingfunctionality (Figure 5 (d)) where a large function is computedusing a garbled circuit (the query complexity is still dependent onthe filtered records; not on the entire dataset). A general featureof all our protocols (not shown in the graphs) is that most heavycomputation is performed at the two servers while the client is veryefficient: across all protocols, the client’s computation time neverexceeds 3 milliseconds. A few more observations follow.

Initialization Phase. In all our protocols, the initialization costsare linear in the number of input records. Here, the speed-up ofsFED over FED OPRF ranges between 6 to 10 for different protocols.This is explained by the extensive use of public key operations inthe latter setting. However, our obtained numbers are still efficientin practice. Additionally, the SFE protocol is slower than the OPRFprotocol by approximately a factor of 1.5 for the FED componentand factor 2 for the SED component. This is due to the level ofnesting of public key operations and onion secret sharing in theSED component. The communication costs for the initializationphase also reflect the same trends.

Query Phase. In all our protocols, the query time is linearly pro-portional to the number of filtered records. This is partly a conse-quence of our dataset construction, where the frequency of anykeyword is proportional to the size of the filtered dataset.11 This re-striction helps us simplify the comparison of the FED protocols withtheir SED components. Also we note that the differences between

11Note that on an arbitrary dataset, our CED protocols will still run in time linear inthe filtered dataset size, but SED query time is linear in frequency of the query’s leastfrequent keyword – which may not equal the size of the filtered dataset in general.

the two FED protocols (with OPRF and SFE) and the sFED protocolare negligible, showing that the map protocols add little overheadrelative to the query phase of sFED. Please refer to Appendix J formore details.

REFERENCES[1] Havasupai tribe and the lawsuit settlement aftermath. http://genetics.ncai.org/

case-study/havasupai-Tribe.cfm.[2] Indian tribe wins fight to limit research of its dna. https://www.nytimes.com/

2010/04/22/us/22dna.html?pagewanted=all&_r=1&.[3] Idash privacy & security workshop 2015. http://www.humangenomeprivacy.org/

2015/competition-tasks.html, 2015.[4] Eu general data protection regulation (gdpr): Regulation (eu) 2016/679. https:

//gdpr-info.eu/, 2016.[5] Jon Baker. Comparing two populations. http://grows.ups.edu/curriculum/

Comparing%20Two%20Population1.htm.[6] Mihir Bellare, Viet Tung Hoang, Sriram Keelveedhi, and Phillip Rogaway. Effi-

cient garbling from a fixed-key blockcipher. Cryptology ePrint Archive, Report2013/426, 2013. https://eprint.iacr.org/2013/426.

[7] John Bethencourt. Paillier library, 2006. http://acsc.cs.utexas.edu/libpaillier/.[8] Dan Boneh, Amit Sahai, and Brent Waters. Functional encryption: Definitions

and challenges. In TCC, pages 253–273, 2011.[9] Raphael Bost. Σoφoς : Forward secure searchable encryption. In Proceedings

of the 2016 ACM SIGSAC Conference on Computer and Communications Security,pages 1143–1154. ACM, 2016.

[10] William S Bush and Jason H Moore. Genome-wide association studies. PLoScomputational biology, 8(12):e1002822, 2012.

[11] R. Canetti. Universally composable security: A new paradigm for cryptographicprotocols. In Proceedings of the 42Nd IEEE Symposium on Foundations of ComputerScience, FOCS ’01, 2001.

[12] David Cash, Joseph Jaeger, Stanislaw Jarecki, Charanjit S. Jutla, Hugo Krawczyk,Marcel-Catalin Rosu, and Michael Steiner. Dynamic searchable encryption invery-large databases: Data structures and implementation. In 21st Annual Networkand Distributed System Security Symposium, NDSS, 2014.

[13] David Cash, Stanislaw Jarecki, Charanjit S. Jutla, Hugo Krawczyk, Marcel-CatalinRosu, and Michael Steiner. Highly-scalable searchable symmetric encryptionwith support for boolean queries. In CRYPTO, 2013.

[14] John C Chambers, James Abbott, Weihua Zhang, Ernest Turro, William R Scott,Sian-Tsung Tan, Uzma Afzal, Saima Afaq, Marie Loh, Benjamin Lehne, et al. Thesouth asian genome. PLoS One, 9(8):e102645, 2014.

[15] Melissa Chase and Seny Kamara. Structured encryption and controlled disclosure.In ASIACRYPT, 2010.

[16] Melissa Chase and Emily Shen. Substring-searchable symmetric encryption.PoPETs, 2015, 2015.

[17] David L. Chaum. Untraceable electronic mail, return addresses, and digitalpseudonyms. Commun. ACM, 24(2):84–90, February 1981.

[18] Geraldine M Clarke, Carl A Anderson, Fredrik H Pettersson, Lon R Cardon,Andrew P Morris, and Krina T Zondervan. Basic statistical analysis in geneticcase-control studies. Nature protocols, 6(2):121, 2011.

[19] Reza Curtmola, Juan Garay, Seny Kamara, and Rafail Ostrovsky. Searchablesymmetric encryption: improved definitions and efficient constructions. InProceedings of the 13th ACM conference on Computer and communications security,pages 79–88. ACM, 2006.

[20] Ioannis Demertzis, Stavros Papadopoulos, Odysseas Papapetrou, Antonios Deli-giannakis, and Minos Garofalakis. Practical private range search revisited. Insigmod, 2016.

[21] Whitney A. Drazen, Emmanuel Ekwedike, and Rosario Gennaro. Highly scalableverifiable encrypted search. In 2015 IEEE Conference on Communications andNetwork Security, CNS, pages 497–505, 2015.

[22] Ben A Fisch, Binh Vo, Fernando Krell, Abishek Kumarasubramanian, VladimirKolesnikov, Tal Malkin, and Steven M Bellovin. Malicious-client security in blindseer: a scalable private dbms. In Security and Privacy (SP), 2015 IEEE Symposiumon, pages 395–410. IEEE, 2015.

[23] Craig Gentry. Fully homomorphic encryption using ideal lattices. In STOC, pages169–178, 2009.

[24] O. Goldreich, S. Micali, and A. Wigderson. How to play any mental game. InSTOC, 1987.

[25] Oded Goldreich. Foundations of Cryptography: Basic Applications. CambridgeUniversity Press, 2004.

[26] Shafi Goldwasser, S. Dov Gordon, Vipul Goyal, Abhishek Jain, Jonathan Katz,Feng-Hao Liu, Amit Sahai, Elaine Shi, and Hong-Sheng Zhou. Multi-inputfunctional encryption. In EUROCRYPT, pages 578–602, 2014.

[27] US Government. http://www.census.gov/ces/dataproducts/index.html.[28] Paul Grubbs, Thomas Ristenpart, and Vitaly Shmatikov. Why your encrypted

database is not secure. In Proceedings of the 16th Workshop on Hot Topics in

http://genetics.ncai.org/case-study/havasupai-Tribe.cfm

http://genetics.ncai.org/case-study/havasupai-Tribe.cfm

https://www.nytimes.com/2010/04/22/us/22dna.html?pagewanted=all&_r=1&

https://www.nytimes.com/2010/04/22/us/22dna.html?pagewanted=all&_r=1&

http://www.humangenomeprivacy.org/2015/competition-tasks.html

http://www.humangenomeprivacy.org/2015/competition-tasks.html

https://gdpr-info.eu/

https://gdpr-info.eu/

http://grows.ups.edu/curriculum/Comparing%20Two%20Population1.htm

http://grows.ups.edu/curriculum/Comparing%20Two%20Population1.htm

https://eprint.iacr.org/2013/426

http://acsc.cs.utexas.edu/libpaillier/

http://www.census.gov/ces/dataproducts/index.html


20000 60000 1000000

200

400

600

800

Com

m (M

B)

Init Communication Size

3000 6000 9000 12000

2

4

6

8

10Query Communication Size

20000 60000 100000#(input records)

0

100

200

300

400

Tim

e (s

ec)

Init Gross Computation Time

3000 6000 9000 12000#(filtered records)

2

4

6

8

10

Query Gross Computation Time

sFEDsSEDFEDOPRFSEDOPRFFEDSFESEDSFE

(a)

20000 60000 1000000

200

400

600

800

1000

Com

m (M

B)


3000 6000 9000 12000

2

4

6

8

Query Communication Size

20000 60000 100000#(input records)

0

100

200

300

400

Tim

e (s

ec)



2

4

6

8

10



(b)

20000 60000 1000000

200

400

600

800

1000

1200

1400

Com

m (M

B)


3000 6000 9000 12000

2

4

6

8

10

12

14

16Query Communication Size

20000 60000 100000#(input records)

0

100

200

300

400

500

Tim

e (s

ec)



5

10

15

20



(c)

5000 15000 25000

0

500

1000

1500

2000

2500

3000

Com

m (M

B)


800 1600 2400 3200

0

1000

2000

3000

4000

5000

Query Communication Size

5000 15000 25000#(input records)

0

20

40

60

80

100

Tim

e (s

ec)



0

5

10

15

20

25

30Query Gross Computation Time


(d)

Figure 5: Total communication cost and computation time in the initialization and query phases for single data owner and multiple dataowner setting (contrasting two SED protocols, OPRF - Section 4.3.1, SFE - Section 4.3.2) for applications: (a) Genome Retrieval, (b) MAF, (c)Chi-square analysis, (d) Average Hamming.

Operating Systems, pages 162–168. ACM, 2017.[29] Megumi Hirokawa, Hiroyuki Morita, Tomoyuki Tajima, Atsushi Takahashi, Kyota

Ashikawa, Fuyuki Miya, Daichi Shigemizu, Kouichi Ozaki, Yasuhiko Sakata,Daisaku Nakatani, et al. A genome-wide association study identifies plcl2 andap3d1-dot1l-sf3a2 as new susceptibility loci for myocardial infarction in japanese.European Journal of Human Genetics, 23(3):374, 2015.

[30] Yuval Ishai, Eyal Kushilevitz, Steve Lu, and Rafail Ostrovsky. Private large-scaledatabases with distributed searchable symmetric encryption. In Cryptographers’Track at the RSA Conference, pages 90–107. Springer, 2016.

[31] Stanislaw Jarecki, Charanjit S. Jutla, Hugo Krawczyk, Marcel-Catalin Rosu, andMichael Steiner. Outsourced symmetric private information retrieval. In CCS,2013.

[32] Seny Kamara, Charalampos Papamanthou, and Tom Roeder. Dynamic searchablesymmetric encryption. In Proceedings of the 2012 ACM conference on Computerand communications security, pages 965–976. ACM, 2012.

[33] Richard Kerbaj and Jon Ungoed-Thomas. http://www.thesundaytimes.co.uk/sto/news/uk_news/National/article1258476.ece?CMP=OTH-gnws-standard-2013_05_11.

[34] Miran Kim and Kristin Lauter. Private genome analysis through homomorphicencryption. In BMC medical informatics and decision making, volume 15, page S3.BioMed Central, 2015.

[35] Jaspal S Kooner, Danish Saleheen, Xueling Sim, Joban Sehmi, Weihua Zhang,Philippe Frossard, Latonya F Been, Kee-Seng Chia, Antigone S Dimas, NeelamHassanali, et al. Genome-wide association study in people of south asian an-cestry identifies six novel susceptibility loci for type 2 diabetes. Nature genetics,43(10):984, 2011.

[36] Kaoru Kurosawa and Yasuhiro Ohtaki. Uc-secure searchable symmetric encryp-tion. In Financial Cryptography and Data Security, pages 285–298. Springer,2012.

[37] Rui Li, Alex X. Liu, Ann L. Wang, and Bezawada Bruhadeshwar. Fast range queryprocessing with strong privacy protection for cloud computing. Proc. VLDBEndow., 7(14), 2014.

[38] Adriana López-Alt, Eran Tromer, and Vinod Vaikuntanathan. On-the-fly multi-party computation on the cloud via multikey fully homomorphic encryption. InProceedings of the Forty-fourth Annual ACM Symposium on Theory of Computing,STOC ’12, pages 1219–1234, 2012.

[39] Siew-Kee Low, Atsushi Takahashi, TaiseiMushiroda, andMichiaki Kubo. Genome-wide association study: a useful tool to identify common genetic variants associ-ated with drug toxicity and efficacy in cancer pharmacogenomics, 2014.

[40] Jacqueline MacArthur, Emily Bowler, Maria Cerezo, Laurent Gil, Peggy Hall,Emma Hastings, Heather Junkins, Aoife McMahon, Annalisa Milano, JoannellaMorales, et al. The new nhgri-ebi catalog of published genome-wide associationstudies (gwas catalog). Nucleic acids research, 45(D1):D896–D901, 2016.

[41] Moni Naor and Omer Reingold. Number-theoretic constructions of efficientpseudo-random functions. J. ACM, 2004.

[42] Muhammad Naveed, Shashank Agrawal, Manoj Prabhakaran, XiaoFeng Wang,Erman Ayday, Jean-Pierre Hubaux, and Carl Gunter. Controlled functionalencryption. In CCS, 2014.

[43] MuhammadNaveed, Manoj Prabhakaran, and Carl AGunter. Dynamic searchableencryption via blind storage. In Security and Privacy (SP), 2014 IEEE Symposiumon, pages 639–654. IEEE, 2014.

[44] IPSOS MORI official statement. https://www.ipsos-mori.com/newsevents/latestnews/1390/Ipsos-MORI-response-to-the-Sunday-Times.aspx.

[45] Cengiz Orencik, Ayse Selcuk, Erkay Savas, and Murat Kantarcioglu. Multi-keyword search over encrypted data with scoring and search pattern obfuscation.International Journal of Information Security, 15(3):251–269, 2016.

[46] Antonis Papadimitriou, Ranjita Bhagwan, Nishanth Chandran, RamachandranRamjee, Andreas Haeberlen, Harmeet Singh, Abhishek Modi, and Saikrishna

http://www.thesundaytimes.co.uk/sto/news/uk_news/National/article1258476.ece?CMP=OTH-gnws-standard-2013_05_11



https://www.ipsos-mori.com/newsevents/latestnews/1390/Ipsos-MORI-response-to-the-Sunday-Times.aspx

https://www.ipsos-mori.com/newsevents/latestnews/1390/Ipsos-MORI-response-to-the-Sunday-Times.aspx


Badrinarayanan. Big data analytics over encrypted datasets with seabed. In OSDI,pages 587–602, 2016.

[47] Vasilis Pappas, Fernando Krell, Binh Vo, Vladimir Kolesnikov, Tal Malkin, Se-ung Geol Choi, Wesley George, Angelos Keromytis, and Steve Bellovin. Blindseer: A scalable private dbms. In Security and Privacy (SP), 2014 IEEE Symposiumon, pages 359–374. IEEE, 2014.

[48] Christopher D Pilcher, Joseph KWong, and Satish K Pillai. Inferring hiv transmis-sion dynamics from phylogenetic sequence relationships. PLoS medicine, 5(3):e69,2008.

[49] Rishabh Poddar, Tobias Boelter, and Raluca Ada Popa. Arx: A strongly encrypteddatabase system. IACR Cryptology ePrint Archive, 2016:591, 2016.

[50] Geong Sen Poh, Moesfa Soeheila Mohamad, and Ji-Jian Chin. Searchable sym-metric encryption over multiple servers. Cryptography and Communications,10(1):139–158, 2018.

[51] Raluca Ada Popa, Catherine Redfield, Nickolai Zeldovich, and Hari Balakrishnan.Cryptdb: protecting confidentiality with encrypted query processing. In Proceed-ings of the Twenty-Third ACM Symposium on Operating Systems Principles, pages85–100. ACM, 2011.

[52] M. G. Reed, P. F. Syverson, and D. M. Goldschlag. Anonymous connections andonion routing. IEEE J.Sel. A. Commun., 16(4):482–494, September 2006.

[53] Amit Sahai and Brent Waters. Fuzzy identity-based encryption. In EUROCRYPT,pages 457–473, 2005.

[54] Elaine Shi, John Bethencourt, TH Hubert Chan, Dawn Song, and Adrian Perrig.Multi-dimensional range query over encrypted data. In Security and Privacy,2007. SP’07. IEEE Symposium on, pages 350–364. IEEE, 2007.

[55] Victor Shoup. NTL: A library for doing number theory. https://www.shoup.net/ntl/.

[56] Stephen Tu, M. Frans Kaashoek, Samuel Madden, and Nickolai Zeldovich. Process-ing analytical queries over encrypted data. In Proceedings of the 39th internationalconference on Very Large Data Bases, PVLDB’13, pages 289–300. VLDB Endow-ment, 2013.

[57] Pim van der Harst and Niek Verweij. The identification of 64 novel genetic lociprovides an expanded view on the genetic architecture of coronary artery disease.Circulation research, pages CIRCRESAHA–117, 2017.

[58] Benjamin F Voight, Laura J Scott, Valgerdur Steinthorsdottir, Andrew P Mor-ris, Christian Dina, Ryan P Welch, Eleftheria Zeggini, Cornelia Huth, Yurii SAulchenko, Gudmar Thorleifsson, et al. Twelve type 2 diabetes susceptibilityloci identified through large-scale association analysis. Nature genetics, 42(7):579,2010.

[59] Charlotte Wang, Wen-Hsin Kao, and Chuhsing Kate Hsiao. Using hammingdistance as information for snp-sets clustering and testing in disease associationstudies. PloS one, 10(8):e0135918, 2015.

[60] Andrew Chi-Chih Yao. How to generate and exchange secrets. In FOCS, 1986.[61] Samee Zahur and David Evans. Obliv-c: A language for extensible data-oblivious

computation. IACR Cryptology ePrint Archive, 2015, 2015.[62] Andrew Zelin. https://www.mrs.org.uk/pdf/Andrew_Zelin_presentation.pdf.

A NOTATIONData Owner - DSingle Data-Owner Functionally Encrypted Datastore - sFEDInput of D to sFED functionality - ZSingle Data-Owner Searchably Encrypted Datastore - sSEDSet of keywords in the sSED database - KSet consisting of the search-attributes in input by D -W;W =

P(K)

The set of unique identifiers for each record in the sSED database -I

Input of D to sSED functionality -W ⊆ W × ISet of id’s selected by a query and returned by the sSED functional-ity - T ⊆ I; T = Q[W ]Single Data-Owner Computably Encrypted Datastore - sCEDSet consisting of the compute-attributes in input by D - XInput of D to sCED functionality - X ⊆ I × XThe set of unique identifiers for each record in the sCED database -J ⊆ I; J = {id|∃x s.t. (id, x) ∈ X }multiplicity of an element y in set S - µS (y)Storage Server - SAuxilliary Server - A

Clients - CQueries from clients - QFunctionally Encrypted Datastore - FEDSearchably Encrypted Datastore - SEDComputably Encrypted Datastore - CEDPublic key encryption - JMKPK denotes encryption of M using apublic-key PK

SEDCED

map &merge-map

OPRF-basedSFE-based

sFEDSection II

(see Figure 2)

FED



sCED


Section III-A sSED

Section III-B

Section IV-C

OnionSecret-Sharing

Section IV-A

sFEDD C

A

S

Z Q, f

f(Q[Z])

sCED

D CSA

X

Q

f(δT[X])

sSED

W

f

Q[W ]

T

Z Q, f

f(Q[Z])

CED

CSA

Xm

Q

f(δT[X])

SED

Wm

f

Q[W ]

T

Q, f

f(Q[Z])FED C

A

S

Z1

Q, f

f(Q[Z])

Dm

D1

Zm

..

. … DmD1ZmZ1

X1

W1

A S

QW1 Wm

…

Q[W ]

C

W Q

K"

Q[W ]map

SED

DmD1

K#

merge-map

Figure 6: FED protocol template. Each data-owner splits its inputdata Zi asWi and Xi , which it inputs to SED and CED, respectively.W denotes the combined data-set

⋃iWi .

B RELATEDWORKIn Table 1, we give a comparison of our work with different relatedworks. As is evident from the table, our work represents a differenttrade-off between functionality, security and query-phase efficiency.

C sCED FOR GENERAL FUNCTIONSIn Figure 7 we present the details of the sCED scheme for generalfunctions discussed in Section 3.4, as well as the modifications re-quired to convert it into anCED scheme, as discussed in Section 4.4.

D ALTERNATE PROTOCOL FOR SUMMATIONOur protocol for summation in Section 3.4 uses additive secretsharing to compute sum. In this section we describe an alternativeprotocol that uses an additive homomorphic scheme to achieve thesame functionality. The advantage of using such a scheme is thatwe can prevent the leakage of the filtered set T to A and reduce theasymptotic communication overheads. However, there is tradeoffin efficiency; the initialization phase is slower as there are additivehomomorphic encryptions that need to be computed. Experimentalcomparison of the two protocols is present in Appendix J.

Additive Homomorphic Summation for sCED:

The scheme relies on an additive homomorphic encryption scheme(like Paillier’s scheme), which supports rerandomization of cipher-texts (or more specifically, homomorphically adding a random ci-phertext (of a random value) to any given ciphertext results in arandom ciphertext). The domain of the values X is identified witha finite group that forms the message space of the homomorphicencryption scheme.• InitializationPhase:D creates a public/secret key-pair (PK, SK)for the homomorphic encryption scheme. It then sends SK to

https://www.shoup.net/ntl/

https://www.shoup.net/ntl/

https://www.mrs.org.uk/pdf/Andrew_Zelin_presentation.pdf


Table 1: Comparison of related works

Functionality Security Query-Phase Efficiency∗∗

Search-and-Compute

SearchsupportsBooleanformulas

Computesupportsgeneralfunctions

MultiData-Owner

MultiClient

Corruption LevelNotes/Additional Assumptions Client Server

Server(s) Client(s) Data-Owner(s)

CryptDB[51]

Y Y Y (SQL) N N Passive Passive PassiveTrusted application server and passiveDBMS server. No server-client collusion.Only snapshot attacker.

O(1) O(t)

Seabed [46] Y Y Y (OLAP) N Y Passive Passive PassiveNo server-client collusion. Clients need akey given by the data owner. Only snap-shot attacker.

O(1) O(n)

Arx [49] Y Y N N N Passive Passive PassiveTrusted application server and passiveDBMS server. No server-client collusion.Only snapshot attacker.

O(1) O(t logn)

OXT [13] N Y NA N N Passive Passive Passive - O(t1k) O(t1k)

OSPIR [31] N Y NA N Y Passive Malicious MaliciousNo server-client collusion. Also providesquery privacy to clients assuming noserver-data owner collusion.

O(t1k) O(t1k)

BlindSeer[47], [22]

N Y NA N Y Passive Malicious PassiveNo server-client collusion. Also providesquery privacy to clients assuming noserver-data owner collusion.

O(t1 logn) O(t1 logn)

CFE [42] N NA Y (circuits) Y Y Passive Malicious PassiveNo authority-client or authority-storagecollusion. Can provide query privacy toclients.

O(n) O(n)

This work Y Y Y (circuits) Y Y Passive Malicious PassiveTwo non-colluding servers. Either servercan collude with a subset of clients ordata owners. Can provide search andfunction query privacy to clients.

O(1) O(t1k)

∗∗Based on a query of the form SELECT SUM(COL) WHERE COL1 = α1 AND COL2 = α2 · · · COLk = αk , where COL and COLi are column names. If computation is notsupported, SUM is omitted; if search is not supported WHERE clause is omitted. n denotes the total number of records in the database and t denotes the number of recordsmatching the search condition. WLOG, assume COL1 = α1 filters the least number of records and t1 denotes the number of records satisfying it. In each case, assume that allsupported pre-processing has been done prior to the query.

A and (PK, {cid}id) to S, where cid ← HomEncPK (xid). A and Sstore the values they receive.• Computation Phase: S first picks a random value y ∈ X andcomputes ρ ← HomEncPK (y); it then computes c = ρ+

∑id∈T cid,

where the summation notations (+ and∑) represent the homo-

morphic addition of ciphertexts supported by the encryptionscheme. It sendsy toC and c toA.A computes z ← HomDecSK (c)and sends z to C, who outputs z − y.• Leakage: On initialization, the ID-set J is leaked to S.

The main advantage of the scheme is that this prevents theleakage of the set T to A, as S can aggregate the values (under alayer of encryption) without involving A. However, this comes atthe cost of efficiency. In particular, the initialization phase here ismuch slower as it involves public key encryptions for each elementin X .

Security against S relies on the semantic security of the homo-morphic encryption scheme. In particular, the view of S (during theinitialization phase) merely involves ciphertexts. Security against Aand C (if not colluding with S) are information-theoretic: during thecomputation phase A sees only a random ciphertext of a randomvalue, and the view of C consists of a fresh secret-sharing of the

final output.

Additive Homomorphic Summation for CED:

We adapt the protocol above to the multi data-owner setting.The modified scheme has the same computation phase as before,but the initialization phase changes as follows:

• Modified Initialization Phase: A creates a public/secret key-pair (PK, SK) for the homomorphic encryption scheme, and pub-lishes PK . EachDi , for each id in its data-set sends {J(id, cid)KPK S }idtoA, where cid ← HomEncPK (xid) (possibly paddedwith dummyentries). A gathers these sets from all Di , sorts them, and sendsthem to S. S recovers the set {id, cid}id.

The modified initialization phase leaks (an upper bound on) |Xi |for each data-owner to A. (Leakage to S remains the same, namelythe ID-set J .)

The protocol relies on the same security analysis as the singledata-owner version. Secure routing through A ensures that S doesnot know about the composition of the data from each data-owner.


A sCED scheme for general functions:The values computed over (i.e., elements in X) are represented as bit strings of the same size, say |x | = t . We shall refer to a Symmetric Key Encryption scheme(SKE) with algorithms (SKE.Gen, SKE.Enc, SKE.Dec) (denoting Key Generation, Encryption and Decryption algorithms respectively).• Initialization Phase: The data-owner D picks a PRF key K . For each (id, x id) ∈ X , where x id = (x id

1 , · · · , x idt ), it computes {ωid,i , λ0id,i , λ

1id,i }

ti=1, where

ωid,i = xid,i ⊕ νid,i , with νid,i being a pseudorandom bit, and λbid,i (for b = 0, 1) are two encryption keys generated by SKE.Gen, all computed using

FK (id, i) as the source of randomness. D sends {id, {λx idi

id,i , ωid,i }ti=1 }id∈X to S, and it sends K to A, for storing.

• Computation Phase: C’s input is a function f defined on a multi-set (order of the inputs not being important). We assume that given a number n, a circuitrepresentation of f can be efficiently computed corresponding the input being a multi-set of size n. The storage server S’ input is a set T ⊆ X .(1) S sends T to A (randomly permuted). Then, for each id ∈ T , A computes {νid,i , λ0id,i , λ

1id,i }

ti=1 using {FK (id, i)}

ti=1.

(2) C sends f to A. Then, A constructs a garbled circuit for f specialized for |T | inputs. The input wires of this circuit are indexed by {(id, i)}id∈T ,i∈[t ]. Let(τ 0id,i , τ

1id,i ) be the pair of wire-labels used for each such input wire.

Then, for each (id, i), and b ∈ {0, 1}, it defines the ciphertext

cbid,i ← SKE.Encλb⊕νid,iid,i

(τb⊕νid,iid,i ).

Note that cωid,iid,i is the encryption of τ

x idi

id,i using the key λx idi

id,i .A sends to S {(c0id,i , c

1id,i )}id∈T ,i∈[t ], along with the garbled circuit except the output decoding map. It sends the output decoding map to C.

(3) For each (id, i), S decrypts cωid,iid,i using λ

x idi

id,i , to get τx idi

id,i . It uses these labels for the input wires to evaluate the circuit, and obtains the output labels. Itsends these output labels to C.

(4) C uses the output decoding map from A, and output labels from S to calculate the output.• Leakage: On initialization, the ID-set J is leaked to S. On each query, T and f are leaked to A, and the circuit structure of f (as revealed by the garbledcircuit) is leaked to S.

Modifications to convert to an CED scheme:• Modified Initialization Phase: Each data-owner Dj , for each (id, x id) ∈ X j , generates νid,i and λbid,i (for b = 0, 1) randomly (instead of using a key shared

with A). It computes ζid = {J(νid,i , λ0id,i , λ1id,i )KPKA }

ti=1, and γid = (id, ζid, {λ

x idi

id,i , ωid,i }ti=1), where ωid,i = xid,i ⊕ νid,i as before. It sends {JγidKPKS }id∈Jj

to A who collects them from all the data-owners, sorts them lexicographically, and sends them to S. S decrypts them to obtain {γid }id∈J , where J =⋃j Jj .

• Modification of the Computation Phase: Step (1) is modified as follows:(1) S sends {ζid }id∈T to A (randomly permuted), and A decrypts them to obtain, for each id ∈ T , {νid,i , λ0id,i , λ

1id,i }

ti=1.

The rest of the computation phase proceeds as above.• Modified Leakage: On initialization, the ID-set J =

⋃j Jj is leaked to S, and the sizes { |Jj | }j are leaked to A. On each query, T (or rather, its pattern)

and f are leaked to A, and the circuit structure of f (as revealed by the garbled circuit) is leaked to S.

Figure 7: A sCED scheme for general functions, and modifications required to convert it into an CED protocol.

E MODIFYING EXISTING SSE SCHEMES INLITERATURE

Many existing SSE schemes and their multi-client versions in liter-ature can be modified and used in our setting of SED. We describeone such instantiation by using the MC-OXT scheme of [31]. Weprovide the core ideas of their scheme in Appendix E.1, explain ourmodifications in Figure 8 and use this for our implementation. Ourmodifications do not require prior knowledge of the scheme andcan be understood by a high level idea of the entities involved andthe functionality expected. Our techniques for modifying MC-OXTscheme can be extended to other SSE schemes as well. Please seeFigure 8 for a high-level idea of the protocol flow in MC-OXT, ourmodifications as well as security-efficiency tradeoffs.

E.1 Brief Description of MC-OXTWe give a brief overview of the multi-client SSE scheme (MC-OXT)of [31]. In this setting, we have a data owner D who processesa large database DB to produce an encrypted database EDB anda master key MSK, of which it sends EDB to the server S. Themodel allows for multiple clients C, who may request D for tokenscorresponding to certain queries Q j , which D constructs using itsMSK. It provides the client with a key corresponding to Q j , along

with a signature so that S may verify that C is authorized to makethis query. The client transforms the received token, say SK j , intosearch tokens required by the OXT protocol [13]. By choosing a(singly) homomorphic signature in the previous step, the client mayalso transform the received signature on SK j into one for the searchtokens. It sends the search tokens and corresponding signaturesto the server, who verifies the signature, and executes the OXTsearch protocol. It obtains encrypted id values, which it returns tothe client, who is in possession of the decryption key.

Next, we provide a (simplified) overview of the protocol, whichbuilds on OXT of [13]. The protocol is designed to support arbitraryBoolean queries but for ease of exposition here, we will considerthe special case where the client makes conjunctive queries of theform w1 ∧ w2 ∧ . . . ∧ wn where ∀i,wi ∈ K . The least fequentword, w1 (say) is known as s-word, while the remaining wordsw2, . . . ,wn are called x-words. It is assumed that the data ownermaintains information that lets it compute the s-word for any queryof a client.

The setup phase of the MC-SSE protocol, as in OXT, constructstwo sets called TSet and XSet as follows. First, it picks PRFs Fτ andFp with the appropriate ranges, and their keys KS and {KX ,KI }

respectively. Intuitively, the keys KS and KX are used to generate


pseudorandom handles of keywords w ∈ K , denoted by strap =Fτ (KS ,w) and xtrap = дFp (KX ,w ) respectively. The key KI is usedto generate a pseudorandom handle of document indices id ∈ DBas xind = Fp (KI , id). KT is a key used to compute stag values asrequired by the TSet implementation. Next, the protocol computesxtag = дFp (KX ,w )·xind for all “valid” (w, id) pairs, that is pairs suchthat the document id contains the keyword w , and adds xtag toXSet.

To populate TSet(w) forw ∈ K so as to enable efficient responseto queries such asw1∧w2∧. . .∧wn , the protocol does the following.If wi has Twi matching documents, then TSet(wi ) contains Twi

entries, each containing the relevant document index (encrypted) aswell as “blinded” xind values to enable the server to check whetherthe remaining keywords also match the given document. Moreformally, for each list TSet(w) forw ∈ K , the data owner generatesa fresh pair of keys (Kz ,Ke ) using strap(w) and encrypts the Twdocument indices as e = Enc(Ke , id). Next, for c ∈ [Tw ], it computesa blinding of the Tw index representatives xind1, . . . , xindTw asfollows: set zc = Fp (Kz , c) and y = xind · zc−1. Store (e,y) inTSet(w). Note that (e,y) are constructed using keys that are specificto s-wordw . It chooses a key KM for AuthEnc - an authenticatedencryption scheme, and this key is shared with the server. The keys(KS ,KX ,KI ,KT ,KM ) are retained by D as the master secret.

When the client requests a token for the queryw1 ∧w2 ∧ . . . ∧wn , the data owner computes the s-word, computes stag usingKT , strap using KS as well as xtrap2, . . . , xtrapn corresponding tow2, . . . ,wn using KX . Next, it blinds the xtrap values as bxtrapi =дFp (KX ,wi )·ρi where ρi are chosen randomly from Z∗p . Then it cre-ates an authenticated encryption env← AuthEnc(KM , (stag, ρ2, . . . , ρn ))and returns SK = (env, strap, bxtrap2, . . . , bxtokenn ) to the client.

The client uses SK to construct search tokens as follows. Usestrap to compute Kz and for c ∈ [Tw ], compute zc = Fp (Kz , c).Now, it sets bxtoken[c, i] = bxtrapzci and sends env along with allthe bxtoken[c, i] to the server. The server verifies the authentic-ity of env using KM , and decrypts it to find stag and ρ2, . . . , ρn .It uses stag to obtain TSet(w1) and for i ∈ {2, . . . ,n}, it checksif bxtoken[c, i]y/ρi ∈ XSet. If so, it retrieves the correspondingencrypted document index e and sends it to the client.

F SECURE FUNCTION EVALUATION (SFE)PROTOCOL FOR SED

F.1 Protocol for computing equality checksWe use securely pre-computed correlations for a secure evaluationof an equality check between two parties. The strings to check forequality will be mapped using a collision resistant hash to, say 64bits, and interpreted as elements in a field, say F = GF (264). Theprotocol uses a pre-computed correlation: Alice holds (a,p) ∈ F2and Bob holds (b,q) ∈ F2 such that a + b = pq and p,q and one ofa,b are uniformly random. Alice sends u := x − p to Bob, where xis her input. Bob replies with v := q(u −y)+b, where y is his input.Alice concludes x = y iff a = v .

F.2 Improving EfficiencySuppose the keyword space partitions as K = B1 ∪ · · · ∪ Bℓ , suchthat there is an upperbound Li j on |Ki ∩ Bj |, for each data-owner

Di and each binBj . Given appropriate statistics on the data domain,such upper bounds can often be derived for easy to compute parti-tions (e.g., a bin may contain keywords in the English vocabularythat start with a certain set of letters or short prefixes). Then, theprotocol can be easily modified so that the equality-checks step(Step 3 in Figure 9 can be repeated for each bin separately, with-out having to compare keywords across bins. The total number ofequality checks needed, becomes O(

∑j (∑i Li j )

2); when ℓ bins areused and if each Li j = O(Li/ℓ), this becomesO((

∑i Li )

2/ℓ), provid-ing a saving of up to factor ℓ over the original construction. (Theassumption that we can take Li j = O(Li/ℓ) would hold only whenℓ is not too large, placing a limit on the efficiency gain possibleusing binning.)

G REDUCING SED LEAKAGE FURTHERThe template used in Section 4.3 reveals the merge-mapped data-set W to A. The mapping ensures that the search results Q[W ] arenot linked to W , and can be simulated independently given onlytheir pattern information. Nevertheless, W itself reveals the entirekeyword-identity incidence matrix, as mentioned in the leakagedescription in Section 4.3.1 and Section 4.3.2. This may be acceptableif the statistics of the overall matrix are not private; further, byadding noise in the form of dummy keywords and identities onecan further limit the accessible information in W .

However, there may be scenarios where it would be desirable toalmost completely eliminate the leakage from W to A. We presenta modification of the earlier two constructions which achieves this,as long as the search queries are individual keyword membershipqueries (rather than predicates over such membership queries).Observe that in the case of individual keyword queries, we maymap an id used by a data-ownerDi to multiple values idτ , for each τsuch that (τ , id) ∈ Wi . Then, on a keyword query, at most one suchvalue will be recovered (corresponding to the queries keyword);we shall require that on decoding any such value is mapped backto the same id. Then, instead of the (unlabelled) keyword-identityincidence matrix, A learns only the total number of keywords andtheir “degree distribution” (i.e., the histogram showing, for each n,the number of keywords that appear in n documents in the mergeddataset). A does not even learn the total number of ids.

For the constructions in Section 4.3.1 and Section 4.3.2, thisis easily implemented by replacing the ciphertext ζ idi by ζ idi ,τ ←

JidKPK S (i.e., fresh encryptions for each (τ , id) ∈ Wi ).

H OVERVIEW OF SECURITY ANALYSISIn this section, we provide a detailed security analysis of our SEDprotocols. The security analysis of the sFED and FED templates aresimilar to (and somewhat simpler than) that of the SED protocoltemplate discussed below.

We focus on the instantiation of the SED protocol template (Sec-tion 4.2) using the components developed in Section 4.3.1. Theanalysis of the full SED protocol breaks down into the analysis ofthe template protocol (Section 4.2) and the protocols formerge-map,map and sSED separately. Note that to enable this modular analy-sis, it is crucial that these three functionalities are fully specified,including their leakages.


High Level Overview of query phase of MC-OXT in [31]:

(1) C inputs a conjunctive query and sends this to D.(2) D performs computation and sends tokens as authorization to C.(3) C performs additional computation over these tokens and sends them to S.(4) S uses the tokens by C to perform search and returns the encrypted set of document indices to C.(5) C locally decrypts these indices to obtain output of the MC-OXT protocol.

Modification to an sSED protocol:

• Initialization phase:The initialization phase proceeds exactly as in the MC-OXT protocol, finally, the keys for authorizing a query are now sent to A instead of being with D.• Query phase:(1) C inputs a conjunctive query and sends this to A.(2) A performs the computation intended by D and C ( above in steps 2,3) and sends tokens to S.(3) S uses the tokens by A to perform search and returns the encrypted set of document indices to A.(4) A decrypts the indices to obtain output of the MC-OXT protocol and sends this to S.

Leakage:• Leakage to A is equivalent to leakage to D during query phase of MC-OXT. A has an additional leakage of the filtered set of id’s.• C does not learn the filtered set of id’s, the size of the documents matching to least frequent keyword or the pattern of the least frequent keyword. In otherwords, C does not incur any leakage in this modification.• Leakage to S stays same as in MC-OXT.

Modification using Additive Homomorphic Encryption:Notation ⟨⟨M ⟩⟩PK denotes additive homomorphic encryption of M using a public-key PK .• Initialization Phase:(1) A and S create a public/secret key-pair (PKA, SKA) and (PKS, SKS) for the homomorphic encryption scheme and publish PKA and PKS respectively.(2) D proceeds exactly similar as in MC-OXT. Whenever D encrypts the database in MC-OXT. It stores an additive homomorphic encryption of id instead of

a symmetric key encryption, i.e. Enc(K , id) is replaced by ⟨⟨id⟩⟩PKA .• Query Phase:(1) C inputs a conjunctive query and sends this to A.(2) A performs the computation intended by D and C ( above in steps 2,3) and sends tokens to S.(3) S uses the tokens by A to perform search and returns the encrypted set of document indices to A, i.e. S receives {cid }id∈T where cid = ⟨⟨id⟩⟩PKA . For each

id, it chooses a random value and adds the random value to the ciphertext. i.e. γid = ⟨⟨rid ⟩⟩PKA + cid and δid = ⟨⟨−rid ⟩⟩PKS . S sends {(γid, δid)}id∈T to A.(4) A decrypts using SKA to reveal {(id+ rid, δid)}id∈T . A encrypts using PKS and calculates {ηid }id∈T where ηid = ⟨⟨id+ rid ⟩⟩PKS + δid. It permutes this set

and sends it to S. (Note that: ηid = ⟨⟨id + rid ⟩⟩PKS + ⟨⟨−rid ⟩⟩PKS = ⟨⟨id⟩⟩PKS )(5) S decrypts using SKS to reveal {id}id∈T .We note that there is a need for permuting the plaintext id’s from the ciphertexts to stay consistent with the exact leakage to S in MC-OXT. Otherwise, Slearns statistics about each id that was decrypted due to the intersections with other keywords that it calculated.

Leakage:• Leakage to A is equivalent to leakage to D during query phase of MC-OXT. A additionally has knowledge of the size of the filtered set of id’s.• C does not incur any leakage in this modification.• Leakage to S stays same as in MC-OXT.

Modification by Removing Encryption:

• Initialization Phase:(1) D proceeds exactly as in MC-OXT, it stores the plaintext in clear, i.e. Enc(K , id) is replaced by id.• Query Phase:(1) C inputs a conjunctive query and sends this to A.(2) A performs the computation intended by D and C ( above in steps 2,3) and sends tokens to S.(3) S uses the tokens by A to perform search and learns the set of filtered indices i.e. {id}id∈T

Leakage:• Leakage to A is equivalent to leakage to D during query phase of MC-OXT. A has no additional leakage.• C does not incur any leakage in this modification.• Leakage to S increases when compared to the leakage of S in MC-OXT. S learns each id in the least frequent keyword and is also able to learn intersectionpatterns between each id in the least frequent keyword and other queried keywords. However, if performing single keyword search, the leakage in thisprotocol exactly matches leakage in MC-OXT.

Figure 8: Modification of MC-OXT in [31] to our sSED setting.


merge-map:

(1) Keys. S generates a keypair for PKE, (PKS, SKS), and similarly A generates (PKA, SKA). A generates a PRF key K . The keys PKS and PKA are published.(2) Shuffled Sharing. At the end of this phase, for each (i , τ ) such that τ ∈ Ki :

- S holds (αi ,τ , tagi ,τ ), where αi ,τ ∈ {0, 1}λ is a random string (as long as a keyword), and tagi ,τ is a unique (random) index for each (i , τ );- A holds (βi ,τ , Γi ,τ , Θi ,τ , tagi ,τ ), where:– βi ,τ := τ ⊕ αi ,τ– Γi ,τ := {Γidi ,τ |(τ , id) ∈ Wi } (padded as explained below). Here Γidi ,τ := (ξ idi ,τ , γ

idi ,τ , Jη

idi ,τ , Jµ

idi ,τ KPKAKPKS ) with:

◦ ξ idi ,τ ← JJζ idi KPKAKPKS (fresh encryptions for each (i , id, τ )), where ζ id

i ← JidKPKS (a single encryption per (i , id)),◦ γ id

i ,τ , ηidi ,τ , µ

idi ,τ being a random additive secret-sharing of 0λ (i.e., γ id

i ,τ ⊕ ηidi ,τ ⊕ µ

idi ,τ = 0λ ).

Γi ,τ is padded with additional entries of the form Γ⊥i ,τ – which is defined identically as Γidi ,τ except that ζ idi ← J⊥KPKS – so that |Γi ,τ | is an a priori

fixed value.– Θi ,τ := (θi ,τ , Jϕi ,τ KPKS ) for a random ϕi ,τ ; and θi ,τ := τ ⊕ ϕi ,τ .

This phase proceeds as follows:• Each Di , for each τ ∈ Ki picks two random masks αi ,τ and ϕi ,τ and computes πi ,τ := (βi ,τ , Γi ,τ , Θi ,τ ), as defined above. Then it sends(αi ,τ , Jπi ,τ KPKA ) to S. (The number of elements sent by each Di is padded up to an a priori bound, with dummy entries of the form (α , J⊥KPKA ).)• S collects entries of the form (αi ,τ , ρi ,τ ) from all Di , and randomly permutes them. Let tagi ,τ denote the serial number of these elements in the permutedorder. S stores (αi ,τ , tagi ,τ ) and communicates (ρi ,τ , tagi ,τ ) to A.• A decrypts the entries to obtain (πi ,τ , tagi ,τ ). (It will discard the entries where π = ⊥.)

(3) Grouping Keywords Using Equality Checks. At the end of this phase, there is a (hidden) injective mapping h :⋃i Ki → [L] (where L equals |

⋃i Ki |,

or an upper-bound thereof). For each (i , τ ) with τ ∈ Ki , A will hold a tuple (h(τ ), Γi ,τ , Θi ,τ ).• Below we index the entries held by S and A as αt , πt etc., t being the tag value. For every two tags, t , t ′, S and A engage in a 2-party secure equalitycheck protocol to check if αt ⊕ αt ′ = βt ⊕ βt ′ . Note that this equality holds iff τt = τt ′ . A learns the results.(If πt = ⊥, A uses an arbitrary string instead of βt in the equality check protocol.)• A partitions the set of tags into equivalence classes T1, · · · ,TL , such that for all k ∈ [L], and t , t ′ ∈ Tk , the equality test above was positive. (Hencethere is some keyword τk corresponding to all the elements in Tk . This implicitly defines the mapping h : τk 7→ k .)

(4) Culling Dummy Entries. At the end of this phase, for each (i , τ , id) with (τ , id) ∈ Wi , A obtains a tuple (h(τ ), ζ idi ).

• A sends to S (in random order) entries of the form (h(τ ) ⊕ γ idi ,τ , Jη

idi ,τ , Jµ

idi ,τ KPKAKPKS , ξ

idi ,τ ) where some of the entries may have id = ⊥ (in which case,

ξ idi ,τ = J⊥KPKS ).• From each such triple, where the last two items are ciphertexts under PKS, S extracts (ηidi ,τ , Jµ

idi ,τ KPKA ) from the first ciphertext and either Jζ id

i KPKAor ⊥ from the second one. It discards all entries where ⊥ is obtained from the second ciphertext. It then permutes and sends back entries of the form(h(τ ) ⊕ γ id

i ,τ ⊕ ηidi ,τ , Jµ

idi ,τ KPKA , Jζ

idi KPKA ) to A.

• From each tuple received, A decrypts the second item to obtain µ idi ,τ , and the third item to receive ζ idi ; combining the former with the first item, A

recovers h(τ ) ⊕ γ idi ,τ ⊕ η

idi ,τ ⊕ µ

idi ,τ = h(τ ).

(5) Sharing the Mapped Keywords. At the end of this phase:- For each (i , τ , id) with (τ , id) ∈ Wi , A obtains a tuple (h(τ ), τ ) where τ = FK (h(τ )), and- S obtains {(k , σk ) |k ∈ L } where, when k = h(τ ), σk = τ ⊕ τ .• For each k ∈ [L], A sets τk = FK (k ) Then, it chooses a representative t ∗k ∈ Tk and sends (θt ∗k ⊕ τk , Jϕt

∗kKPKS ) to S (sorted by k ). (If there are fewer

than L equivalence classes, randomly generated θ and ϕ values are used to generate dummy messages.)• S receives an ordered list of L pairs (δk , Jϕk KPKS ), and it computes σk = δk ⊕ ϕk .

(6) Outputs. A outputs KA = K and W consisting of elements (τ , ζ idi ). S outputs KS = (Z , SKS), where Z is the set {(k , σk ) |k ∈ [L]} = {(h(τ ), σh(τ )) |τ ∈⋃

i Ki }.

map: A client C has an input Q which is specified using one or more keywords. For each keyword τ appearing in Q , C engages in a protocol with S and A tocompute τ . The protocol is as follows:• C sends additive secret-shares γ , δ to S and A respectively, where γ ⊕ δ = τ .• For k = 1, · · · , L, S and A carry out an equality check protocol to check if (γ ⊕ σk ) = (δ ⊕ FK (k )). The output is revealed to A. There will be at most onevalue k∗ such that the equality holds.• A sends FK (k∗) to C, who takes it as τ . (If no such k∗ exists, A sends a random value, or alternately, if permitted, may reveal to C that there is no match.)

Figure 9: Protocols implementing merge-map and map functionalities for instantiating the template for SED protocols, based on SFE.

First we analyzemerge-map, in Figure 4. Instead of directly usingthe random oracle model, we assume an ideal OPRF functionality(which is then UC-securely realized in the random oracle model).Given the ideal OPRF functionality, the view of S consists of Liinvocations of OPRF be data-owner Di , followed by Ni cipher-texts encrypted under PKA. These can be simulated knowing Ni , Liwhich are part of the leakage, assuming semantic-security of thepublic-key encryption scheme. (Note that we assume that all thekeywords are encoded into bit-strings of the same length.) The view

of A consists of a set of ciphertexts (permuted to disassociate withthe data-owners who created it) which can be perfectly simulatedknowing only

∑i Ni (which is part of the leakage) and W (which

is part of the output). For this, we rely on the semantic security ofthe public-key encryption scheme.

The protocol formap in Figure 4 is easily seen to be a UC-securerealization of the corresponding functionality as it directly uses theOPRF functionality. As described in Section 4.3.1, we UC-securelyrealize this functionality against active corruption of the receiver,


in the random oracle model, by modifying the OPRF protocol of[31] (see Footnote 10).

Finally, we turn to the analysis of the SED template protocol(see Figure 3). This relies on the fact that the leakages from SEDto A and S are defined to include the corresponding leakages frommerge-map, map and sSED, as well as the information requiredto simulate the intermediate output (namely W ) that A receives,namely the keyword-identity incidence matrix of the merged data.(Here we rely on the pseudorandomness guarantee of the OPRF toensure that by learning W , A learns nothing more than the unla-beled keyword-identity incidence matrix of the merged data, exceptfor the labeling of keywords held by corrupt data-owners.) Also, KSwhich is output to S is sampled independently of the inputs (andhence perfectly simulated), and the information revealed to S byQ[W ] can be perfectly simulated from Q[W ] and KS. In this proto-col, if the functionality sSED and merge-map are implemented tohandle actively corrupt clients, the overall protocol can also be seento be admit active corruption of clients. merge-map, as mentionedabove, relies on the OPRF protocol for this, and as discussed inSection 3.5, we can modify the searchable encryption protocolsfrom the literature to handle actively corrupt clients.

The analysis of the merge-map and map protocols in Figure 9 ismore tedious, but the simulation is not muchmore complicated thanabove. In particular, the ciphertexts received by the servers S and Athroughout the protocol (encrypted using the other server’s public-key) can be simulated only knowing the number of ciphertexts (byrelying on the semantic secuity of public-key encryption). Theseprotocols also rely on a 2-party equality-check protocol, which canbe replaced for the purposes of analysis using an ideal equality-check functionality. By carefully inspecting the protocol one canverify that the view of either server (colluding with data-ownersor clients, but not with each other) can be entirely simulated usingthe leakage information specified in Section 4.3.2. As this analysisis not particularly enlightening, we omit the details and point thereader to the discussion on onion secret-sharing (Section 4.1) forthe intuition underlying the construction and the analysis.

I IMPLEMENTATION: ADDITIONAL DETAILSWe use the following opensource libraries and codebases in ourimplementation.

• We used the OpenSSL library (version 1.0.2) for the following.PRFs and Symmetric Key Encryption (SKE) were based on AES-256, while public key encryption (PKE) was implemented us-ing RSA-OAEP, using hybrid encryption where appropriate. Forgroup operations in the MC-OXT protocol of [31], we used NIST-224p elliptic curves.• We re-implemented the MC-OXT scheme of [31] and ensuredthat the performance is comparable to that reported in [31].• Additive Homomorphic scheme was implemented using the Pail-lier Cryptosystem using the Libpaillier library [7].• JustGarble [6] was used to implement theCED scheme for generalfunction evaluation.• Two-party computation for CED for composite queries was im-plemented using Obliv-C [61]. This provided an implementationof Oblivious Transfer and Yao’s Garbled Circuit.12

• NTL (Number Theory Library) [55] version 11.3.0 was used toimplement the field operations in our construction using secureequality checks

J EXPERIMENTATIONMore details of our experimentation are summarized below.

• Genome Retrieval: As seen in Figure 5a, sSED is the heavierpart of the computation in sFED. Computation on filtered inputsof size 12000 took close to 11 and 3.7 sec for FED and sFED.

• MAF: Computation on filtered inputs of size 12000 took close to11 seconds for FED and close to 4 sec for sFED (See Figure 5b).This competes with GenomeRet.

• Chi-square Analysis: The Chi-square formula was representedas a garbled circuit with 38000 gates. This required 0.23 sec forcomputation using the Obliv-C framework [61]. Note that thiscomputation stayed invariant of the number of records filtered.Computation on filtered inputs of size 12000 took close to 24 secfor FED and close to 14 sec for sFED. Time taken was more thantwice when compared to MAF or Genome Retrieval as there weretwo summation queries computed in the inner protocol and anadditional polynomial computed in the outer protocol. Please seeFigure 5c for details.

• Average Hamming distance: In this case, the computation isheavy in the sCED stage and all trends observed due to thechanges in sSED are hidden, as seen in Figure 5d. On a populationof size 25000, being filtered to 3100 individuals, the circuit beingcomputed had approximately 80 million gates and 60 millioninput bits. Hence, the two functionalities sFED and FED (bothprotocols i.e. OPRF and SFE) perform similarly. Computation onfiltered inputs of size 3100 took about 30 sec.

• Comparing the two summation protocols: Our two summa-tion protocols are compared in Figure 10. During initialization,while summation based on additive secret sharing took about 6and 0.6 sec in the FED and sFED functionalities respectively, thealternate summation protocol took correspondingly 207 and 202sec for the same dataset. Comparing FED and sFED, we note thatduring initialization the computation and hence performance aresame. During the query phase, the two functionalities performequally well.

12Obliv-C provides a high-level interface for implementing a secure 2-party computa-tion protocol. But it is not suited for generating garbled circuits outside the frameworkof a protocol, for which we relied on JustGarble.

functionally encrypted datastoresshwetaag/papers/fed.pdfconference’17, july 2017, washington, dc,...

Documents