arxiv:1612.04265v3 [cs.cr] 27 feb 2017 · end-to-end email encryption can shield email contents...

17
Pretzel: Email encryption and provider-supplied functions are compatible Trinabh Gupta *† Henrique Fingler * Lorenzo Alvisi *‡ Michael Walfish * UT Austin NYU Cornell Abstract. Emails today are often encrypted, but only be- tween mail servers—the vast majority of emails are exposed in plaintext to the mail servers that handle them. While better than no encryption, this arrangement leaves open the pos- sibility of attacks, privacy violations, and other disclosures. Publicly, email providers have stated that default end-to-end encryption would conflict with essential functions (spam fil- tering, etc.), because the latter requires analyzing email text. The goal of this paper is to demonstrate that there is no con- flict. We do so by designing, implementing, and evaluating Pretzel. Starting from a cryptographic protocol that enables two parties to jointly perform a classification task without revealing their inputs to each other, Pretzel refines and adapts this protocol to the email context. Our experimental evalua- tion of a prototype demonstrates that email can be encrypted end-to-end and providers can compute over it, at tolerable cost: clients must devote some storage and processing, and provider overhead is roughly 5× versus the status quo. 1 Introduction Email is ubiquitous and fundamental. For many, it is the principal communication medium, even with intimates. For these reasons, and others that we outline below, our animating ideal in this paper is that email should be end-to-end private by default. How far are we from this ideal? On the plus side, hop-by- hop encryption has brought encouraging progress in protect- ing email privacy against a range of network-level attacks. Specifically, many emails now travel between servers over encrypted channels (TLS [41, 50]). And network connections between the user and the provider are often encrypted, for example using HTTPS (in the case of webmail providers) or VPNs (in the case of enterprise email accounts). However, emails are not by default encrypted end-to-end between the two clients: intermediate hops, such as the sender’s and receiver’s provider, handle emails in plaintext. Since these providers are typically well-run services with a reputation to protect, many users are willing to just trust them. This trust however, appears to stem more from shifting so- cial norms than from the fundamental technical safety of the arrangement, which instead seems to call for greater caution. Reputable organizations have been known to unwittingly harbor rogue employees bent on gaining access to user email accounts and other private user information [26, 99, 135]. If you were developing your latest startup idea over email, would you be willing to bet its viability on the assumption that every employee within the provider acts properly? And well- run organizations are not immune from hacks [121, 122]— nor, indeed, from the law. Just in the first half of 2013, Google [59], Microsoft [94] and Yahoo! [126] collectively received over 29,000 requests for email data from law enforce- ment, and in the overwhelming majority of cases responded with some customer data [93]. Many of these requests did not even require a warrant: under current law [100], email older than 180 days can be acquired without judicial approval. End-to-end email encryption can shield email contents from prying eyes and reduce privacy loss when webmail ac- counts are hacked; and, while authorities would still be able to acquire private email by serving subpoenas to the account’s owner, they would not gain unfettered access to someone’s private correspondence without that party’s knowledge. Why then are emails not encrypted end-to-end by de- fault? After all, there has long been software that imple- ments this functionality, notably PGP [144]; moreover, the large webmail providers offer it as an option [58, 125] (see also [23, 107, 109, 120]). A crucial reason—at least the one that is often cited [37, 38, 49, 60, 106]—is that encryption appears to be incompatible with value-added functions (such as spam filtering, email search, and predictive personal as- sistance [27]) and with the functions by which “free” web- mail providers monetize user data (for example, topic ex- traction) [62]. These functions are proprietary; for example, the provider might have invested in training a spam filtering model, and does not want to publicize it (even if a dedicated party can infer it [112]). So it follows that the functions must execute on providers’ servers with access to plaintext emails. But does that truly follow? Our objective in this paper is to refute these claims of incompatibility, and thus move a step closer to the animating ideal that we stated at the outset, by building an alternative, called Pretzel. In Pretzel, senders encrypt email using an end-to-end en- cryption scheme, and the intended recipients decrypt and obtain email contents. Then, the email provider and each re- cipient engage in a secure two-party computation (2PC); the term refers to cryptographic protocols that enable one or both parties to learn the output of an agreed-upon function, without revealing the inputs to each other. For example, a provider supplies its spam filter, a user supplies an email, and both parties learn whether the email is spam while protecting the details of the filter and the content of the email. The challenge in Pretzel comes from the 2PC component. There is a tension between expressive power (the best 2PC schemes can handle any function and even hide it from one of the two parties) and cost (those schemes remain exorbitant, despite progress in lowering the costs; §3.2). Therefore, in designing Pretzel, we decided to make certain compromises to gain even the possibility of plausible performance: baking 1 arXiv:1612.04265v3 [cs.CR] 27 Feb 2017

Upload: others

Post on 10-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: arXiv:1612.04265v3 [cs.CR] 27 Feb 2017 · End-to-end email encryption can shield email contents from prying eyes and reduce privacy loss when webmail ac-counts are hacked; and, while

Pretzel: Email encryption and provider-supplied functions are compatibleTrinabh Gupta∗† Henrique Fingler∗ Lorenzo Alvisi∗‡ Michael Walfish†

∗UT Austin †NYU ‡Cornell

Abstract. Emails today are often encrypted, but only be-tween mail servers—the vast majority of emails are exposedin plaintext to the mail servers that handle them. While betterthan no encryption, this arrangement leaves open the pos-sibility of attacks, privacy violations, and other disclosures.Publicly, email providers have stated that default end-to-endencryption would conflict with essential functions (spam fil-tering, etc.), because the latter requires analyzing email text.The goal of this paper is to demonstrate that there is no con-flict. We do so by designing, implementing, and evaluatingPretzel. Starting from a cryptographic protocol that enablestwo parties to jointly perform a classification task withoutrevealing their inputs to each other, Pretzel refines and adaptsthis protocol to the email context. Our experimental evalua-tion of a prototype demonstrates that email can be encryptedend-to-end and providers can compute over it, at tolerablecost: clients must devote some storage and processing, andprovider overhead is roughly 5× versus the status quo.

1 IntroductionEmail is ubiquitous and fundamental. For many, it is theprincipal communication medium, even with intimates. Forthese reasons, and others that we outline below, our animatingideal in this paper is that email should be end-to-end privateby default.

How far are we from this ideal? On the plus side, hop-by-hop encryption has brought encouraging progress in protect-ing email privacy against a range of network-level attacks.Specifically, many emails now travel between servers overencrypted channels (TLS [41, 50]). And network connectionsbetween the user and the provider are often encrypted, forexample using HTTPS (in the case of webmail providers) orVPNs (in the case of enterprise email accounts).

However, emails are not by default encrypted end-to-endbetween the two clients: intermediate hops, such as thesender’s and receiver’s provider, handle emails in plaintext.Since these providers are typically well-run services with areputation to protect, many users are willing to just trust them.This trust however, appears to stem more from shifting so-cial norms than from the fundamental technical safety of thearrangement, which instead seems to call for greater caution.

Reputable organizations have been known to unwittinglyharbor rogue employees bent on gaining access to user emailaccounts and other private user information [26, 99, 135].If you were developing your latest startup idea over email,would you be willing to bet its viability on the assumption thatevery employee within the provider acts properly? And well-run organizations are not immune from hacks [121, 122]—nor, indeed, from the law. Just in the first half of 2013,

Google [59], Microsoft [94] and Yahoo! [126] collectivelyreceived over 29,000 requests for email data from law enforce-ment, and in the overwhelming majority of cases respondedwith some customer data [93]. Many of these requests did noteven require a warrant: under current law [100], email olderthan 180 days can be acquired without judicial approval.

End-to-end email encryption can shield email contentsfrom prying eyes and reduce privacy loss when webmail ac-counts are hacked; and, while authorities would still be ableto acquire private email by serving subpoenas to the account’sowner, they would not gain unfettered access to someone’sprivate correspondence without that party’s knowledge.

Why then are emails not encrypted end-to-end by de-fault? After all, there has long been software that imple-ments this functionality, notably PGP [144]; moreover, thelarge webmail providers offer it as an option [58, 125] (seealso [23, 107, 109, 120]). A crucial reason—at least the onethat is often cited [37, 38, 49, 60, 106]—is that encryptionappears to be incompatible with value-added functions (suchas spam filtering, email search, and predictive personal as-sistance [27]) and with the functions by which “free” web-mail providers monetize user data (for example, topic ex-traction) [62]. These functions are proprietary; for example,the provider might have invested in training a spam filteringmodel, and does not want to publicize it (even if a dedicatedparty can infer it [112]). So it follows that the functions mustexecute on providers’ servers with access to plaintext emails.

But does that truly follow? Our objective in this paper is torefute these claims of incompatibility, and thus move a stepcloser to the animating ideal that we stated at the outset, bybuilding an alternative, called Pretzel.

In Pretzel, senders encrypt email using an end-to-end en-cryption scheme, and the intended recipients decrypt andobtain email contents. Then, the email provider and each re-cipient engage in a secure two-party computation (2PC); theterm refers to cryptographic protocols that enable one or bothparties to learn the output of an agreed-upon function, withoutrevealing the inputs to each other. For example, a providersupplies its spam filter, a user supplies an email, and bothparties learn whether the email is spam while protecting thedetails of the filter and the content of the email.

The challenge in Pretzel comes from the 2PC component.There is a tension between expressive power (the best 2PCschemes can handle any function and even hide it from oneof the two parties) and cost (those schemes remain exorbitant,despite progress in lowering the costs; §3.2). Therefore, indesigning Pretzel, we decided to make certain compromisesto gain even the possibility of plausible performance: baking

1

arX

iv:1

612.

0426

5v3

[cs

.CR

] 2

7 Fe

b 20

17

Page 2: arXiv:1612.04265v3 [cs.CR] 27 Feb 2017 · End-to-end email encryption can shield email contents from prying eyes and reduce privacy loss when webmail ac-counts are hacked; and, while

in specific algorithms, requiring that the algorithms not beproprietary (only their inputs, such as model parameters, arehidden), and incurring per-function design work.

The paper’s central example is classification, which Pretzelapplies to both spam filtering and topic extraction (for com-pleteness, the implementation also includes a simple keywordsearch function). Pretzel’s first step is to compose (a) a rela-tively efficient 2PC protocol (§3.2) geared to computationsthat consist mostly of linear operations[19, 28, 69, 98, 102],(b) linear classifiers from machine learning (Naive Bayes,SVMs, logistic regression), which fit this form and which havegood accuracy (§3.1), and (c) mechanisms that protect againstadversarial parties. Although the precise protocol (§3.3) hasnot appeared before, we don’t claim it as a contribution, asits elements are well-understood. This combination is simplythe jumping-off point for Pretzel.

The work of Pretzel is adapting and incorporating this base-line into a system for end-to-end encrypted email. In this con-text, the costs of the baseline would be, if not quite outlandish,nevertheless too high. Pretzel responds, first, with lower-levelprotocol refinements: revisiting the cryptosystem (§4.1) anddesigning a packing technique that conserves calls into thecryptosystem (§4.2). Second, for topic extraction, Pretzel re-arranges the setup, by decomposing classification into a non-private step, performed by the client, which prunes the set oftopics; and a private step that further refines this candidateset to a single topic. Making this work requires a modifiedprotocol that, roughly speaking, selects a candidate maximumfrom a particular subset, while hiding that subset (§4.3). Third,Pretzel applies well-known ideas (feature selection to reducecosts, various mechanisms to guard against misuses of theprotocol, etc.); here, the work is demonstrating that these aresuitable in the present context.

We freely admit that not all elements of Pretzel are individ-ually remarkable. However, taken together, they produce thefirst (to our knowledge) demonstration that classification canbe done privately, at tolerable cost, in the email setting.

Indeed, evaluation (§6) of our implementation (§5) indi-cates that Pretzel’s cost, versus a legacy non-private imple-mentation, is estimated to be up to 5.4×, with additionalclient-side requirements of several hundred megabytes ofstorage and per-email CPU cost of several hundred millisec-onds. These costs represent reductions versus the startingpoint (§3.3) of up to 100×.

Our work here has clear limitations (§7). Reflecting its pro-totype status, our implementation does not hide metadata, andhandles only the three functions mentioned (ideally, it wouldhandle predictive personal assistance, virus scanning, andmore). More fundamentally, Pretzel compromises on func-tionality; by its design, both user and provider have to agreeon the algorithm, with only the inputs being private. Mostfundamentally, Pretzel cannot achieve the ideal of perfectprivacy; it seems inherent in the problem setup that one partygains information that would ideally be hidden. On the otherhand, these leaks are generally bounded, and concerned users

can opt out, possibly at some dollar cost (§4.4, §7).The biggest limitation, though, is that Pretzel cannot

change the world on its own. As we discuss later (§7), thereare other obstacles en route to the ultimate goal: general de-ployment difficulties, key management, usability, and evenpolitics. However, we hope that the exercise of workingthrough the technical details to produce an existence proof(and a rough cost estimate) will at least shape discourse aboutthe viability of default end-to-end email encryption.

2 Architecture and overview of Pretzel2.1 Design ethos: (non)requirements

Pretzel would ideally (a) enable rich computation over email,(b) hide the inputs and implementations of those computa-tions, and (c) impose negligible overhead. But these threeideals are in tension. Below we describe the compromisesthat form Pretzel’s design ethos.

• Functionality. We will not insist that Pretzel provide exactreplicas of the rich computations that today’s providers runover emails; instead, the goal is to retain approximationsof these functions.

• Provider privacy. Related to the prior point, Pretzel will notsupport proprietary algorithms; instead, Pretzel will protectthe inputs to the algorithms. For example, all users of Pret-zel will know the spam filtering model, but the parametersto the model will be proprietary.

• User privacy. Pretzel will not try to enshroud users’ emailin complete secrecy; indeed, it seems unavoidable thatcomputing over emails would reveal some informationabout them. However, Pretzel will be designed to revealonly the outputs of the computation, and these outputs willbe short (in bits).

• Threat model and maliciousness. Pretzel will not build inprotection against actions that subvert the protocol’s se-mantics (for example, a provider who follows the protocolto the letter but who designs the topic extraction model totry to recover a precise email); we will deal with this issueby relying on context, a point we elaborate on later (§4.4,§7). Pretzel will, however, build in defenses against adver-saries that deviate from the protocol’s mechanics; thesedefenses will not assume particular misbehaviors, only thatadversaries are subject to normal cryptographic hardness.

• Performance and price. Whereas the status quo imposeslittle overhead on email clients, Pretzel will require storageand computation at clients. However, Pretzel will aim tolimit the storage cost to several hundred megabytes andthe CPU cost to a few hundred milliseconds of time perprocessed email. For the provider, Pretzel’s aim is to limitoverhead to small multiples of the cost in the status quo.

• Deployability and usability. Certain computations, such asencryption, will have to run on the client; however, Pretzelwill aim to be configuration-free. Also, Pretzel must bebackwards compatible with existing email delivery infras-

2

Page 3: arXiv:1612.04265v3 [cs.CR] 27 Feb 2017 · End-to-end email encryption can shield email contents from prying eyes and reduce privacy loss when webmail ac-counts are hacked; and, while

recipient's email client

e2e module

a function module

email e mailbox e2e modulee' e'

sender's email client

recipient's email provider

1

2

3 e

4

Figure 1—Pretzel’s architecture. e denotes plaintext email; e′ de-notes encrypted email. The sender’s provider is not depicted.

tructure (SMTP, IMAP, etc.).

2.2 Architecture

Figure 1 shows Pretzel’s architecture. Pretzel comprises ane2e module and function modules. The e2e module imple-ments an end-to-end encryption scheme; a function moduleimplements a semantic computation (spam filtering, etc.). Thee2e module is client-side only, while a function module has acomponent at the client and another at the provider.

At a high level, Pretzel works as follows. An email senderuses its e2e module to encrypt and sign an email for an emailrecipient (step À). The recipient uses its e2e module to au-thenticate and decrypt the email (step Á). The e2e modulecan implement any end-to-end encryption scheme; Pretzel’scurrent prototype uses GPG [1]. Next, the recipient passesthe decrypted email contents to the client-side componentsof the function modules (step Â). Finally, these client-sidecomponents participate in a protocol with their counterpartsat the provider (step Ã). At the end of the protocol, either theclient or the provider learns the output of the computation.

Pretzel’s e2e module requires cryptographic keys for en-crypting, decrypting, signing, and verifying. Thus, Pretzelrequires a solution to key management [2, 90, 120]. However,this is a separate effort, deserving of its own paper or productand (as noted in the introduction) is one of the obstacles thatPretzel does not address. Later (§7), we discuss why we areoptimistic that it will ultimately be overcome.

The main work for Pretzel surrounds the function mod-ules; the challenge is to balance privacy, functionality, andperformance (§2.1). Our focus will be on two modules: spamfiltering and topic extraction (§3, §4). We will also report on akeyword search module (§5). But before delving into details,we walk through some necessary background, on the classof computations run by these modules and the cryptographicprotocols that they build on.

3 Background, baseline, and related work3.1 Classification

Spam filtering and topic extraction are classification problemsand, as such, require classifier algorithms. Pretzel is geared tolinear classifiers. So far, we have implemented Naive Bayes(NB) [63, 89, 92, 101] classifiers, specifically a variant ofGraham-Robinson’s NB [63, 101] for spam filtering (we call

this variant GR-NB),1 and multinomial NB [89] for topic ex-traction; Logistic Regression (LR) classifiers [51, 57, 82, 95],specifically binary LR [82] and multinomial LR [51] for spamfiltering and topic extraction respectively; and linear SupportVector Machine (SVM) classifiers [29, 40, 73, 104], specif-ically two-class and one-versus-all SVM [29] for spam fil-tering and topic extraction respectively. These algorithms, orvariants of them, yield high accuracy [39, 57, 63, 65, 73, 139](see also §6.1, §6.2), and are used both commercially [18] andin popular open-source software packages for spam filtering,classification, and general machine learning [3–7, 51].

The three types of classifiers differ in their underlyingassumptions and how they learn parameters from training data.However, when applying a trained model, they all performanalogous linear operations. We will use Naive Bayes as arunning example, because it is the simplest to explain.

Naive Bayes classifiers. These algorithms assume that a doc-ument can belong to one of several categories (for example,spam or non-spam). The algorithms output a prediction of adocument’s category.

Documents (emails) are represented by feature vectors~x=(x1, . . . , xN), where N is the total number of features. Afeature can be a word, a group of words, or any other effi-ciently computable aspect of the document; the algorithmsdo not assume a particular mapping between documents andfeature vectors, only that some mapping exists. In the GR-NB spam classifier [63, 101], xi is Boolean, and indicatesthe presence or absence of feature i in the document; in themultinomial NB text classifier, xi is the frequency of feature i.

The algorithms take as input a feature vector and a modelthat describes the categories. A model is a set of vectors{(~vj, p(Cj))} (1 ≤ j ≤ B), where Cj is a category (for exam-ple, spam or non-spam), and B is the number of categories(two for spam; 2208 for topics, based on Google’s publiclist of topics [8]). p(Cj) denotes the assumed a priori cate-gory distribution. The ith entry of~vj is denoted p(ti |Cj) andis, roughly speaking, the probability that feature i, call it ti,appears in documents whose category is Cj.2

The GR-NB spam classification algorithm labels an email,as represented by feature vector ~x, as spam if p(spam |~x)is greater than some fixed threshold. To do so, the algo-rithm computes α = 1/p(spam |~x)− 1 in log space. One can

1The original Graham-Robinson NB protects against spam emails that hide ashort message within a large non-spam text [64]. We do not implement thatpiece; the resulting change in classification accuracy is small (§6.1).

2In more detail, the GR-NB spam classifier assumes that the {xi} are realiza-tions of independent, separate Bernoulli random variables (RVs), with theprobabilities of each RV, p(ti |Cj), depending on the hypothesized category.The multinomial NB text classifier assumes that the {xi} follow a multino-mial distribution, with N bins and

∑i xi trials, where the bin probabilities

are p(ti |Cj) and depend on the hypothesized category.

3

Page 4: arXiv:1612.04265v3 [cs.CR] 27 Feb 2017 · End-to-end email encryption can shield email contents from prying eyes and reduce privacy loss when webmail ac-counts are hacked; and, while

show (Apdx A.1) that logα is equivalent to:(i=N∑i=1

xi · log p(ti |C2)

)+ 1 · log p(C2)

(i=N∑i=1

xi · log p(ti |C1)

)+ 1 · log p(C1), (1)

where C1 represents spam and C2 represents non-spam.For the multinomial NB text classifier, selection works

by identifying the category Cj∗ that maximizes likelihood:j∗ = argmaxj p(Cj |~x). One can show (Apdx A.2) that itsuffices to select the Cj for which the following is maximal:(

i=N∑i=1

xi · log p(ti |Cj)

)+ 1 · log p(Cj). (2)

For LR and SVM classifiers, the term log p(ti |Cj) is re-placed by a “weight” term wi,j for feature xi and category Cj,and log p(Cj) is replaced by a “bias” term bj for category j.

3.2 Secure two-party computation

To perform the computation described above within a functionmodule (§2.2) securely, i.e., in a way that the client does notlearn the model parameters and the provider does not learnthe feature vector, Pretzel uses secure two-party computation(2PC): cryptographic protocols that enable two parties tocompute a function without revealing their inputs to eachother [56, 128]. Pretzel builds on a relatively efficient 2PCprotocol [19, 28, 69, 98, 102] that we name Yao+GLLM; wepresent this below, informally and bottom up. (For details andrigorous descriptions, see [55, 66, 84, 103].)

Yao’s 2PC. A building block of Yao+GLLM is the classicscheme of Yao [128]. Let f be a function, represented as aBoolean circuit (meaning a network of Boolean gates: AND,OR, etc.), with n-bit input, and let there be two parties P1and P2 that supply separate pieces of this input, denoted x1and x2, respectively. Then Yao (as the protocol is sometimesknown), when run between P1 and P2, takes as inputs f andx1 from P1, x2 from P2, and outputs f (x1, x2) to P2, such thatneither party learns about the other’s input: P1 does not learnanything about x2, and P2 does not learn anything about x1except what can be inferred from f (x1, x2).

At a very high level, Yao works by having one party gen-erate encrypted truth tables, called garbled Boolean gates,for gates in the original circuit, and having the other partydecrypt and thereby evaluate the garbled gates.

In principle, Yao handles arbitrary functions. In practice,however, the costs are high. A big problem is the computa-tional model. For example, 32-bit multiplication, when rep-resented as a Boolean circuit, requires on the order of 2,000gates, and each of those gates induces cryptographic oper-ations (encryption, etc.). Recent activity has improved thecosts (see [61, 67, 77, 79, 83, 108, 133, 134] and referencestherein), but the bottom line is still too expensive to han-dle arbitrary computations. Indeed, Pretzel’s prototype uses

Yao very selectively—just to compute several comparisonsof 32-bit numbers—and even then it turns out to be a bot-tleneck (§6.1, §6.2), despite using a recent and optimizedimplementation [132, 133].

Secure dot products. Another building block of Yao+GLLMis a secure dot product protocol, specifically GLLM [55].Many such protocols (also called secure scalar product(SSP) protocols) have been proposed [21, 25, 43–48, 55,70, 105, 113, 114, 123, 124, 141–143]. They fall into twocategories: those that are provably secure [43, 55, 123] andthose that either have no security proof or require trustinga third party [21, 25, 44–48, 70, 105, 113, 114, 124, 141–143]. Several protocols in the latter category have been at-tacked [36, 55, 68, 75, 80]. GLLM [55] is in the first category,is state of the art, and is widely used.

Hybrid: Yao+GLLM. Pretzel’s starting point is Yao+GLLM,a hybrid of Yao and GLLM. It is depicted in Figure 2. Oneparty starts with a matrix, and encrypts the entries. The otherparty starts with a vector and leverages additive (not fully)homomorphic encryption (AHE) to (a) compute the vector-matrix product in cipherspace, and (b) blind the resultingvector. The first party then decrypts to obtain the blindedvector. The vector then feeds into Yao: the two parties removethe blinding and perform some computation φ.

Yao+GLLM has been applied to spam filtering usingLR [98], face recognition using SVM [19], and face and bio-metric identification using Euclidean distance [28, 69, 102].

Other related work. There are many works on private clas-sification that do not build on Yao+GLLM. They rely on alter-nate building blocks or hybrids: additively homomorphic en-cryption [30, 85] (AHE), fully homomorphic encryption [78](FHE), or a different Yao hybrid [33]. For us, Yao+GLLMappeared to be a more promising starting point (as examples,Yao+GLLM contains a packing optimization that significantlysaves client-side storage resources relative to [30], and incontrast to [78], Yao+GLLM reveals only the final output ofthe computation rather than intermediate dot products).

Another related line of research focuses on privacy andlinear classifiers—but in the training phase. Multiple partiescan train a global model without revealing their private in-puts [53, 54, 76, 110, 115, 116, 118, 127, 129–131, 136–138],or a party can release a trained “noisy” model that hides itstraining data [74, 117, 140]. These works are complementaryto Pretzel’s focus on applying the model.

3.3 Baseline protocol

Pretzel begins by applying the Yao+GLLM protocol (Figure 2,§3.2) to the algorithms described in Section 3.1. This worksbecause expressions (1) and (2) are dot products of the neces-sary form. Specifically, the provider is party A and supplies(~vj, p(Cj)), the client is party B and supplies (~x, 1), and theprotocol computes their dot product. Then, the threshold com-parison (for spam filtering) or the maximal selection (for topicextraction) happens inside an instance of Yao. For spam fil-

4

Page 5: arXiv:1612.04265v3 [cs.CR] 27 Feb 2017 · End-to-end email encryption can shield email contents from prying eyes and reduce privacy loss when webmail ac-counts are hacked; and, while

Yao+GLLM

• The protocol has two parties. Party A begins with a matrix; Party B begins with a vector. The protocol computes a vector-matrix product andthen performs an arbitrary computation, φ, on the resulting vector; neither party’s input is revealed to the other.

• The protocol assumes an additively homomorphic encryption (AHE) scheme (Gen, Enc, Dec), meaning that Enc(pk, m1) ·Enc(pk, m2)=Enc(pk, m1 + m2), where m1, m2 are plaintext messages, + represents addition of two plaintext messages, and · is an op-eration on the ciphertexts. This also implies that given a constant z and Enc(pk, m1), one can compute Enc(pk, z · m1).

Setup phase1. Party A forms a matrix with columns~v1, . . . ,~vB; each vector has N components. It does the following:

• Generates public and secret keys (pk, sk)← Gen(1k), where k is a security parameter.

• Encrypts each column component-wise, so Enc(pk,~vj) = (Enc(pk, v1,j), . . . , Enc(pk, vN,j)).

• Sends the encrypted matrix columns and pk to Party B.

Computation phase2. Party B begins with an N-component vector~x=(x1, . . . , xN). It does the following:

• (dot products) Computes encrypted dot product for each matrix column: Enc(pk, dj)=Enc(pk,∑N

i=1 xi · vi,j), this abuses notation, sincethe encryption function is not deterministic. The computation relies on the homomorphic property.

• (blinding) Blinds dj by adding random noise nj ∈R {0, 1}b+δ . That is, computes Enc(pk, dj + nj)=Enc(pk, dj) ·Enc(pk, nj). Here b is thebit-length of dj and δ ≥ 1 is a security parameter.

• Sends (Enc(pk, d1 + n1), . . . , Enc(pk, dB + nB)) to Party A.

3. Party A applies Dec component-wise, to get (d1 + n1, . . . , dB + nB)4. Party A and Party B participate in Yao’s 2PC protocol; they use a function f that subtracts the noise nj from dj + nj and applies the functionφ to the dj. One of the two parties (which one depends on the arrangement) obtains the output φ(d1, . . . , dB).

Figure 2—Yao+GLLM. This protocol [19, 28, 69, 98, 102] combines GLLM’s secure dot products [55] with Yao’s general-purpose 2PC [128].Pretzel’s design and implementation apply this protocol to the linear classifiers described in §3.1. The provider is Party A, and the client isParty B. Pretzel’s instantiation of this protocol incorporates several additional elements (§3.3): a variant of Yao [71, 77] that defends againstactively adversarial parties; amortization of the expense of this variant via precomputation in the setup phase; a technique to defend againstadversarial key generation (for example, not invoking Gen); and a packing technique (§4.2) in step 1 (bullet 2) and step 2 (bullet 1).

tering, the client receives the classification output; for topicextraction, the provider does. Note that storing the encryptedmodel at the client is justified by an assumption that modelvectors change infrequently.

In defining this baseline, we include mechanisms to defendagainst adversarial parties (§2.2). Specifically, whereas underthe classical Yao protocol an actively adversarial party canobtain the other’s private inputs [71], Pretzel incorporates avariant [71, 77] that solves this problem. This variant bringssome additional expense, but it can be incurred during thesetup phase and amortized. Also, Yao+GLLM assumes that theAHE’s key generation is done honestly, whereas we wouldprefer not to make that assumption; Pretzel incorporates thestandard response.3

While the overall baseline is literally new (Yao+GLLM waspreviously used in weaker threat models, etc.), its elementsare well-known, so we do not claim novelty.

4 Pretzel’s protocol refinementsThe baseline just described is a promising foundation. Butadapting it to an end-to-end system for encrypted email re-quires work. The main issue is costs. As examples, for aspam classification model with N=5M features, the protocol3In more detail, the AHE has public parameters which, if chosen adversely(non-randomly) would undermine the expected usage. To get around this,Pretzel determines these parameters with Diffie-Hellman key exchange [42,91], so that both parties inject randomness into these parameters.

consumes over 1 GB of client-side storage space; for topicextraction with B=2048 categories, it consumes over 150 msof provider-side CPU time and 8 MB in network transfers (§6).Another thing to consider is the robustness of the guarantees.

This section describes Pretzel’s refinements, adjustments,and modifications. The nature of the work varies from low-level cryptographic optimizations, to architectural rearrange-ment, to applications of known ideas (in which case the workis demonstrating that they are suitable here). We begin withrefinements that are aimed at reducing costs (§4.1–§4.3), theeffects of which are summarized in Figure 3; then we describePretzel’s robustness to misbehaving parties (§4.4).

4.1 Replacing the cryptosystem

Both Pretzel’s protocol and the baseline require additivelyhomomorphic encryption (Figure 2). The traditional choicefor AHE—it is used in prior works [19, 69, 98, 102]—is Pail-lier [96], which is based on a longstanding number-theoreticpresumed hardness assumption. However, Paillier’s Dec takeshundreds of microseconds on a modern CPU, which con-tributes substantially to provider-side CPU time.

Instead, Pretzel turns to a cryptosystem based on the Ring-LWE assumption [87], a relatively young assumption (whichis usually a disadvantage in cryptography) but one that hasnonetheless received a lot of recent attention [17, 31, 86, 88].Specifically, Pretzel incorporates the additively homomorphiccryptosystem of Brakerski and Vaikuntanathan [32], as im-

5

Page 6: arXiv:1612.04265v3 [cs.CR] 27 Feb 2017 · End-to-end email encryption can shield email contents from prying eyes and reduce privacy loss when webmail ac-counts are hacked; and, while

Non-private Baseline (§3.3) Pretzel (§4.1–§4.3)

SetupProvider CPU time N/A N ·βpail ·epail+Kcpu N′ ·β′xpir ·expir+Kcpu

Client CPU time N/A Kcpu KcpuNetwork transfers N/A N ·βpail ·cpail+Knet N′ ·β′xpir ·cxpir+Knet

Client storage N/A N ·βpail ·cpail N′ ·β′xpir ·cxpir

Per-emailProvider CPU L ·h+L ·B ·s βpail ·dpail+B ·yper-in β′′xpir ·dxpir+B′′ ·yper-in

Client CPU N/A L ·βpail ·apail+βpail ·epail+B·yper-in L ·βxpir ·axpir+(L+B′) ·s+β′′xpir ·expir+B′′ ·yper-in

Network szemail szemail + βpail ·cpail+B·szper-in szemail + β′′xpir ·cxpir+B′′ ·szper-in

L = number of features (§3.3) in an email h = CPU time to extract a feature and lookup its conditional probabilitiesB = number of categories in the model (§3.3) s = CPU time to add two probabilitiesszemail = size of an email N′ = number of features selected after aggressive feature selection (§4.3)N = number of features in the model (§3.3) e = encryption (in an AHE scheme) CPU timed = decryption (in an AHE scheme) CPU time c = ciphertext size in an AHE schemea = homomorphic addition (in an AHE scheme) CPU time (Fig. 2) b = log L+bin+ fin (§4.2)bin = # of bits to encode a model parameter (§4.2) fin = # of bits for the frequency of a feature in an email (§4.2)p = # of b-bit probabilities packed in a ciphertext (§4.2) s = “left-shift” CPU time in XPIR-BV (§4.2)B′′ = B (if spam) or B′= # of candidate topics (�B) (§4.3) (if topics) βpail :=dB/ppaile,βxpir :=dB/pxpireβ′xpir :=bB/pxpirc+1/bpxpir/kc where k=(B mod pxpir) β′′xpir :=βxpir (if spam) or B′ (if topics)yper-in, szper-in = Yao CPU time and network transfers per b-bit input (§3.2) Kcpu, Knet = constants for CPU and network costs (§3.3)

Figure 3—Cost estimates for classification. Non-private refers to a system in which a provider locally classifies plaintext email. The baseline isdescribed in Section 3.3. Microbenchmarks are given in §6.

plemented and optimized by Melchor et al. [20] in the XPIRsystem; we call this XPIR-BV. This change brings the costof each invocation of Dec down over an order of magnitude,to scores of microseconds (§6), and similarly with Enc. Thegain is reflected in the cost model (Figure 3), in replacing dpailwith dxpir (likewise with epail and expir, etc.)

However, the change makes ciphertexts 64× larger: from256 bytes to 16 KB. Yet, this is not the disaster that it seems.Network costs do increase (in Figure 2, step 2, bullet 3), butby far less than 64×. Because the domain of the encryptionfunction grows, one can tame what would otherwise be anexplosion in network and storage, and also gain further CPUsavings. We describe this next.

4.2 Packing

The basic idea is to represent multiple plaintext elements in asingle ciphertext; this opportunity exists because the domainof Enc is much larger than any single element that needs tobe encrypted. Using packing, one can reduce the number ofinvocations of Enc and Dec in Figure 2, specifically in step 1bullet 2, step 2 bullet 2, and step 3. The consequence is asignificant reduction in resource consumption, specificallyprovider CPU time and storage for spam filtering, and providerCPU time for topic extraction.

The challenge is to preserve the semantic content of theunderlying operation (dot product) within cipherspace. Below,we describe prior work, and then how Pretzel overcomes alimitation in this work.

Prior work. A common packing technique is as follows.Let G denote number of bits in the domain of the encryp-tion algorithm Enc, bin denote the number of semanticallyuseful bits in an element that would be encrypted (a modelparameter in our case), and fin denote the number of bits forthe multiplier of an encrypted element (frequency of a fea-ture extracted from an email in our case). The output of a

p

v1

v2

v(p-1)

vp

p < p

v(B-1)

vB

Figure 4—Packing in Pretzel.

dot product computation—assuming a sum of L products,each formed from a bin-bit element and a fin-bit element—has b = log L + bin + fin “semantic bits,” (in our context, Lwould be the number of features extracted from an email).This means that there is “room” to pack p = dG/be bits intoa single ciphertext.

GLLM [55] incorporates this technique; this takes placein Figure 2, step 1, bullet 2 (though it isn’t depicted in thefigure). In that step, GLLM traverses each row in its matrixfrom left to right, encrypting together sets of p numbers. Iffewer than p numbers are left, the scheme encrypts thosetogether but does not cross a row. Then, in step 2, bullet 1it performs dot products on the packed ciphertexts, by ex-ploiting the fact that the elements that need to be addedare aligned. For example, if the elements in the first row(v1,1, . . . , v1,p) are to be added to those in the second row(v2,1, . . . , v2,p), then the ciphertext space operation appliedto c1=Enc(pk, v1,1‖ . . . ‖v1,p) and c2=Enc(pk, v2,1‖ . . . ‖v2,p)yields c3 = c1 · c2 = Enc(pk, v1,1 + v2,1‖ . . . ‖v1,p + v2,p).For this to work, the individual sums (e.g., v1,p + v2,p) cannotoverflow b bits.

Pretzel’s packing scheme. With the technique above, the“last ciphertext” in a row contains extra space whenever Bis larger than a multiple of p; this is because ciphertextsfrom different rows are not packed together. If dB/pe is large,

6

Page 7: arXiv:1612.04265v3 [cs.CR] 27 Feb 2017 · End-to-end email encryption can shield email contents from prying eyes and reduce privacy loss when webmail ac-counts are hacked; and, while

then this waste is insignificant. However, if B is small—andin particular is much smaller than p—then the precedingtechnique leaves a substantial optimization opportunity onthe table. For example, if B = 2 (as in the spam filteringapplication) and p = 1024 (as in the XPIR-BV cryptosystem)then the technique described above packs two elements perciphertext, gaining a factor of two, when a factor of 1024 waspossible in principle.

Pretzel’s packing technique “crosses rows.” Specifically,Pretzel splits the matrix {(~vj, p(Cj))} into zero or more sets ofp column vectors plus up to one set with fewer than p vectorsas depicted in Figure 4. For the sets with p vectors, Pretzelfollows the technique above. For the last set, Pretzel packselements in row-major order without restricting the packingto be within a row (see the rightmost matrix in Figure 4),under one constraint: elements in the same row of the matrixmust not be put into different ciphertexts.

The challenge with across-row packing is being able toadd elements from two different rows that are packed in thesame ciphertext (in step 2, bullet 1). To address this chal-lenge, Pretzel exploits the homomorphism to cyclically rotatethe packed elements in a ciphertext. For example, assumec=Enc(pk, v1,1‖ . . . ‖v1,k‖v2,1‖ . . . ‖v2,k) is a packed cipher-text, where v1,1, . . . , v1,k are elements from the first row, andv2,1, . . . , v2,k are from the second row. To add each v1,i withv2,i for i ∈ {1, . . . , k}, one can “left-shift” elements in c by kpositions to get c′=Enc(pk, v2,1‖ . . . ‖v2,k‖ . . .); this is doneby applying the “constant multiplication” operation (Figure 2,bullet 2), with z = 2k·b. At this point, the rows are “lined up”,and one can operate on c and c′ to add the plaintext elements.

Cost savings. Here we give rough estimates of the effect ofthe refinements in this subsection and the previous; a moredetailed evaluation is in Section 6. For the spam filteringmodule, the provider’s CPU drops by 5× and the client-sidestorage drops by 7×, relative to the baseline (§3.3). However,CPU at the client increases by 10× (owing to the cyclic shifts),and the network overhead increases by 5.4×; despite theseincreases, both costs are not exorbitant in absolute terms, andwe view them as tolerable (§6.1, §6.3). The provider-sidecosts for spam filtering are comparable to an arrangementwhere the provider classifies plaintext emails non-privately.

For the topic extraction module, the cost improvementsrelative to the baseline (§3.3) are smaller: provider CPU dropsby 1.37×, client CPU drops by 3.25×, storage goes up by afactor of 2, and the network cost goes up slightly. Beyondthat, the non-private version of this function is vastly cheaperthan for spam, to the point that the private version is up totwo orders of magnitude worse than a non-private version(depending on the resource). The next section addresses this.

4.3 Rearranging and pruning in topic extraction

Decomposed classification. So far, many of the costs areproportional to B: storage (see Figure 2, “setup phase”), CPUcost of Yao, and network cost of Yao (step 4). For spamfiltering, this is not a problem (B = 2) but for topic extraction,

B can be in the thousands.Pretzel’s response is a technique that we called decom-

posed classification. To explain the idea, we regard topicextraction as abstractly mapping an email, together with a setS of cardinality B (all possible topics), down to a set S∗ ofcardinality 1 (the chosen topic), using a proprietary model.Pretzel decomposes this map into two:

(i) Map the email, together with the set S, to a set S′ ofcardinality B′ (for example, B′ = 20); S′ comprisescandidate topics. The client does this by itself.

(ii) Map the email, together with S′, down to a set S′′ ofcardinality 1; ideally S′′ is the same as S∗ (otherwise,accuracy is sacrificed). This step relies on a proprietarymodel and is done using secure two-party machinery.Thus, the expensive part of the protocol is now propor-tional to B′ rather than B.

For this arrangement to make sense, several requirementsmust be met. First, the client needs to be able to perform themap in step (i) locally. Here, Pretzel exploits an observation:topic lists (the set S) are public today [8]. They have to be, sothat advertisers can target and users can set interests. Thus, aclient can in principle use some non-proprietary classifier forstep (i). Pretzel is agnostic about the source of this classifier;it could be supplied by the client, the provider, or a third party.

Second, the arrangement needs to be accurate, which it iswhen S′ contains S∗. Pretzel observes that although the classi-fier used in step (i) would not be honed, it doesn’t need to be,because it is performing a far coarser task than choosing asingle topic. Thus, in principle, the step (i) map might reliablyproduce accurate outputs—meaning that the true topic, S∗,is among the B′ candidates—without much training, exper-tise, or other proprietary input. Our experiments confirm thatindeed the loss of end-to-end accuracy is small (§6.2).

Finally, step (ii) must not reveal S′ to the provider, since thatwould be more information than a single extracted topic. Thisrules out instantiating step (ii) by naively applying the existingprotocol (§3.3–§4.2), with S′ in place of S. Pretzel’s responseis depicted in Figure 5. There are some low-level details tohandle because of the interaction with packing (§4.2), butat a high level, this protocol works as follows. The providersupplies the entire proprietary model (with all B topics); theclient obtains B dot products, in encrypted form, via the inex-pensive component of Yao+GLLM (secure dot product). Theclient then extracts and blinds the B′ dot products that corre-spond to the candidate topics. The parties finish by using Yaoto privately identify the topic that produced the maximum.

Feature selection. Protocol storage is proportional to N (Fig-ure 2, “setup phase”). Pretzel’s response is the standard tech-nique of feature selection [111]: incorporating into the modelthe features most helpful for discrimination. This takes placein the “setup phase” of the protocol (the number of rows in theprovider’s matrix reduces from N to N′). Of course, one pre-sumes that providers already prune their models; the proposalhere is to do so more aggressively. Section 6.2 shows that inreturn for large drops in the number of considered features,

7

Page 8: arXiv:1612.04265v3 [cs.CR] 27 Feb 2017 · End-to-end email encryption can shield email contents from prying eyes and reduce privacy loss when webmail ac-counts are hacked; and, while

Pretzel’s protocol for proprietary topic extraction, based on candidate topics

• The protocol has two parties. Party A begins with a matrix ~v1, . . . ,~vB. Party B begins with a vector ~x=(x1, . . . , xN) and a list S′ of B′ < Bcolumn indexes, where each index is between 1 and B; S′ indicates a subset of the columns of matrix~v. The protocol constructs a vector fromthe product of ~x and the submatrix of ~v given by S′, and outputs the column index (in ~v) that corresponds to the maximum element in thevector-submatrix product; neither party’s input is revealed to the other.

• The protocol has two phases: setup and computation. The setup phase is as described in Figure 2 but with the addition of packing from §4.2.

Computation phase3. Party B does the following:

• (compute dot products) As described in Figure 2, step 2, bullet 1, and §4.2. At the end of the dot product computations, it gets a vector ofpacked ciphertexts ~pcts=(Enc(pk, d1‖ . . . ‖dp), . . . , Enc(pk, . . . ‖dB‖ . . .)), where di is the dot product of~x and the i-th matrix column~vi,and p is the number of b-bit positions in a packed ciphertext (§4.2).

• (separate out dot products for the columns in S′ from the rest) For each entry in S′, i.e., S′[j], makes a copy of the packed ciphertextcontaining dS′[j], and shifts dS′[j] to the left-most b-bit position in that ciphertext. Because each ciphertext holds p elements, the separationworks by using the quotient and remainder of S′[j], when divided by p, to identify, respectively, the relevant packed ciphertext andposition within it. That is, for 1 ≤ j ≤ B′, computes ciphertext Enc(pk, dS′[j]‖ . . .)= ~pcts[Qj] · 2b·Rj , where Qj=dS′[j]/pe − 1, andRj=(S′[j]− 1) mod p. The shifting relies on the multiply-by-constant homomorphic operation (see Figure 2 and §4.2).

• (blinding) Blinds dS′[j] using the technique described in Figure 2, step 2, bullet 2, but extended to packed ciphertexts. Sends the B′

ciphertexts (Enc(pk, dS′[1] + n1‖ . . .), . . . , Enc(pk, dS′[B′] + nB′‖ . . .)) to Party A. Here, nj is the added noise.

4. Party A applies Dec on the B′ ciphertexts, followed by bitwise right shift on the resulting plaintexts, to get dS′[1] + n1, . . . , dS′[B′] + nB′ .5. The two parties engage in Yao’s 2PC. Party B’s private inputs are S′ and {nj} for 1 ≤ j ≤ B′; Party A’s private inputs are {(dS′[j] + nj)}

for 1 ≤ j ≤ B′; and, the parties use a function f that subtracts nj from dS′[j] + nj, and computes and returns S′[argmaxj dS′[j]] to Party A.

Figure 5—Protocol for proprietary topic extraction, based on candidate topics (this instantiates step (ii) in Section 4.3). The provider is Party A;the client is Party B. This protocol builds on the protocol presented in §3.3–§4.2.

the accuracy drops only modestly. In fact, reductions of 75%in the number of features is a plausible operating point.

Cost savings. The client-side storage costs reduce by thefactor N/N′ due to feature selection. For B=2048, B′=20,and L=692 (average number of features per email in theauthors’ emails), relative to the protocol in §4.2, the providerCPU drops by 45×, client CPU drops by 8.4×, and the networktransfers drop by 20.4× (§6.2). Thus, the aforementioned twoorders of magnitude (above the non-private version) becomesroughly 5×.

4.4 Robustness to misbehaving parties

Pretzel aims to provide the following guarantees, even whenparties deviate from the protocol:

1. The client and provider cannot (directly) observe eachother’s inputs nor any intermediate state in the computation.

2. The client learns at most 1 bit of output each time spamclassification is invoked.

3. The provider learns at most log B bits of output per email.This comes from topic extraction.

Guarantee 1 follows from the baseline protocol, whichincludes mechanisms that thwart the attempted subversion ofthe protocol (§3.3). Guarantee 2 follows from Guarantee 1and the fact that the client is the party who gets the spamclassification output. Guarantee 3 follows similarly, providedthat the client feeds each email into the protocol at most once;we discuss this requirement shortly.

Before continuing, we note that the two applications arenot symmetric. In spam classification, the client, who gets the

output, could conceivably try to learn the provider’s model;however, the provider does not directly learn anything aboutthe client’s email. With topic extraction, the roles are reversed.Because the output is obtained by the provider, what is po-tentially at risk is the privacy of the email of the client, whoinstead has no access to the provider’s model.

Semantic leakage. Despite its guarantees about the numberof output bits, Pretzel has nothing to say about the meaningof those bits. For example, in topic extraction, an adversar-ial provider could construct a tailored “model” to attack anemail (or the emails of a particular user), in which case thelog B bits could yield important semantic information aboutthe email. A client who is concerned about this issue hasseveral options, including opting out of topic extraction (andpresumably compensating the provider for service, since akey purpose of topic extraction is ad display, which generatesrevenue). We describe a more mischievous response below.

In the spam application, an adversarial client could con-struct emails to try to pinpoint or infer model parameters,and then leak the model to spammers (and more generallyundermine the proprietary nature of the model). A possibledefense would be for the provider to periodically revise themodel (and maintain different versions).

Repetition and replay. An adversarial provider could con-ceivably replay a given email to a client k times. The providerwould then get k log B bits from the email, rather than log B.Our defense is simply for the client to regard email trans-mission from each sender (and possibly each sender’s de-vice) as a separate asynchronous—and lossy and duplicating—transmission channel. Solutions to detecting duplicates over

8

Page 9: arXiv:1612.04265v3 [cs.CR] 27 Feb 2017 · End-to-end email encryption can shield email contents from prying eyes and reduce privacy loss when webmail ac-counts are hacked; and, while

such channels are well-understood: counters, windows, etc.Something to note is that, for this defense to work, emailshave to be signed, otherwise an adversary can deny service bypretending to be a sender and spuriously exhausting counters.

Integrity. Pretzel does not offer any guarantees about whichfunction Yao actually computes. For example, for topic extrac-tion, the client could garble (§3.2) argmin rather than argmax.In this case, the aforementioned guarantees continue to hold(no inputs are disclosed, etc.), though of course this misbehav-ior interferes with the ultimate functionality. Pretzel does notdefend against this case, and in fact, it could be considered afeature—it gives the client a passive way to “opt out”, withplausible deniability (the client could garble a function thatproduces an arbitrary choice of index).

The analogous attack, for spam, is for the provider to garblea function other than threshold comparison. This would under-mine the spam/nospam classification and would presumablybe disincentivized by the same forces incentivizing providersto supply spam filtering as a service in the first place.

5 Other details and implementationKeyword search in Pretzel. So far Pretzel’s design has fo-cused on modules for classification, but for completeness,Pretzel also needs to handle a third function: keyword search.Pretzel’s variant of keyword search is a simple existence proofthat the provider’s servers are not essential: the client simplymaintains and queries a client-side search index. This bringsa storage cost, but our evaluation suggests that it is tolera-ble (§6.3). One issue is that an index may not be availableif a user logs in from a new machine. This can be addressedwith a provider-side solution, which is future work (such asolution could be built on searchable symmetric encryption,which allows search over encrypted data [34, 35, 72, 97]).

Implementation details. We have fully implemented the de-sign described in §2.2, §3.3, §4, and the keyword search mod-ule just described. We borrow code from existing libraries:Obliv-C [132] for Yao’s 2PC protocol;4 XPIR [20] for theXPIR-BV AHE scheme, LIBLINEAR [24, 51] to train LR andSVM classifiers, and SQLite FTS4 [9] for the search index.Besides the libraries, Pretzel is written in 5,300 lines of C++and 160 lines of Python.

6 EvaluationOur evaluation answers the following questions:

1. What are the provider- and client-side overheads of Pret-zel? For what configurations (model size, email size,etc.) are they low?

2. How much do Pretzel’s optimizations help in reducingthe overheads?

4Another choice would have been TinyGarble [108]. We found the perfor-mance of Obliv-C and TinyGarble to be comparable for the functions wecompute inside Yao in Pretzel; we choose the former because it is easier tointegrate with Pretzel’s C++ code.

3. How accurate are Pretzel’s functions (for example, howaccurately can Pretzel filter spam emails or extract topicsof emails)?

A summary of evaluation results is as follows:• Pretzel’s provider-side CPU consumption for spam filtering

and topic extraction is, respectively, 0.65 and 1.03–1.78×of a non-private arrangement, and, respectively, 0.17× and0.01–0.02× of its baseline (§3.3).

• Network transfers in Pretzel are 2.7–5.4× of a non-privatearrangement, and 0.024–0.048× of its baseline (§3.3).

• Pretzel’s client-side CPU consumption is less than 1s peremail, and storage space use is a few hundred MBs. Theseare a few factors lower than in the baseline (§3.3).

• For topic extraction, the potential coarsening effects ofPretzel’s classifiers (§4.3) are a drop in accuracy of between1–3%.

Method and setup. We consider spam filtering, topic extrac-tion, and keyword search separately.

For spam filtering and topic extraction, we compare Pretzelto its starting baseline (§3.3) (which we call Baseline), andNoPriv, which models the status quo, in which the providerlocally runs classification on plaintext email contents. For thekeyword search function, we consider only the basic client-side search index based scheme (§5).

We vary the following parameters: number of features (N)and categories (B) in the classification models, number offeatures in an email (L), and the number of candidate topics(B′) in topic extraction. For the classification models, we useboth synthetic and real-world datasets. To generate syntheticemails, we use random words (between 4 to 12 letters each).For real-world data, we use the Ling-spam [22] (481 spamand 2,411 non-spam emails), Enron [10] (17,148 spam and16,555 non-spam emails of about 150 Enron employees) andGmail (355 spam and 600 non-spam emails from the Gmailaccount of one of the authors)5 datasets for spam filteringevaluation, and the 20 Newsgroup [11] (18,846 Usenet postson 20 topics), Reuters-21578 [12] (12,603 newswire storieson 90 topics), and RCV1 [81] (806,778 newswire stories from296 regions) datasets for topic extraction evaluation.

We measure resource overheads in terms of provider- andclient-side CPU times to process an email, network transfersbetween provider and client, and the storage space used at aclient. The resource overheads are independent of the classifi-cation algorithm (NB, LR, SVM), so we present them once;the accuracies depend on the classification algorithm, so wepresent them for each algorithm.

Our testbed is Amazon EC2. We use one machine of typem3.2xlarge for the provider and one machine of the same typefor a client. At the provider, we use an independent CPU foreach module/function. Similarly, the client uses a single CPU.

5Gmail stores spam for only a limited time; therefore, we do not have accessto a large number of spam emails. The non-spam emails are from the sametime period as the spam emails.

9

Page 10: arXiv:1612.04265v3 [cs.CR] 27 Feb 2017 · End-to-end email encryption can shield email contents from prying eyes and reduce privacy loss when webmail ac-counts are hacked; and, while

encryption decryption addition left shift and add

GPG 1.7 ms 1.3 ms N/A N/APaillier 2.5 ms 0.7 ms 7 µs N/A

XPIR-BV 103 µs 31 µs 3 µs 70 µs

Yao cost CPU network transfers

φ = integer comparison 71 µs 2501 Bφ = argmax 70 µs 3959 B

map lookup float addition

NoPriv operations 0.17 µs 0.001 µs

Figure 6—Microbenchmarks for the common operations (Figure 3)in Pretzel and the baselines. CPU costs come from a single CPU of anAmazon EC2 machine of type m3.2xlarge. Both CPU and networkcosts are averaged over 1,000 runs; standard deviations (not shown)are within 5% of the averages. GPG encryption/decryption timesdepend on the length of the email; we use an email size of 75 KB,which is in line with average email size [13]. Similarly, Yao costsfor φ=argmax depend linearly on the number of input values; weshow costs per input value.

0

300

600

900

1200

1500

1800

2100

2400

N=200K N=1M N=5M

CP

U t

ime

per

em

ail

(µs)

(

low

er i

s bet

ter)

Number of features in model

NoPriv (L=200)

NoPriv (L=1K)

NoPriv (L=5K)

Baseline

Pretzel

Figure 7—Provider-side CPU time per email in microseconds forthe spam filtering module while varying the number of features (N)in the spam classification model, and the number of features (L)in an email. CPU time for NoPriv varies only slightly with N (notvisible), while (provider-side) CPU times for Baseline and Pretzelare independent of both L and N (Figure 3).

Microbenchmarks. Figure 6 shows the CPU and networkcosts for the common operations (Figure 3) in Pretzel and thebaselines. We will use these microbenchmarks to explain theperformance evaluation in the next subsections.

6.1 Spam filtering

This subsection reports the resource overheads (provider- andclient-side CPU time, network transfers, and client-side stor-age space use) and accuracy of spam filtering in Pretzel.

We set three different values for the number of features inthe spam classification model: N={200K, 1M, 5M}. Thesevalues correspond to the typical number of features in vari-ous deployments of Bayesian spam filtering software [14–16]. We also vary the number of features in an email(L={200, 1000, 5000}); these values are chosen based onthe Ling-spam dataset (average of 377 and a maximum of3638 features per email) and the Gmail dataset (average of692 and a maximum of 5215 features per email). The numberof categories B is two: spam and non-spam.Provider-side CPU time. Figure 7 shows the per-email CPUtime consumed by the provider.

Size

N=200K N=1M N=5M

Non-encrypted 4.3 MB 21.5 MB 107.3 MBBaseline 51.6 MB 258.0 MB 1.3 GBPretzel-NoOptimPack 3.1 GB 15.3 GB 76.3 GBPretzel 7.4 MB 36.7 MB 183.5 MB

Figure 8—Size of encrypted and plaintext spam classification mod-els. N is the number of features in the model. Pretzel-NoOptimPackis Pretzel with the legacy packing technique (§4.2).

For emails with fewer features (L=200), the CPU time ofPretzel is 2.7× NoPriv’s and 0.17× Baseline’s. Pretzel’s ismore than NoPriv’s because in NoPriv the provider does Lfeature extractions, map lookups, and float additions, whichare fast operations (Figure 6), whereas in Pretzel, the providerdoes relatively expensive operations: one additively homomor-phic decryption of a XPIR-BV ciphertext plus one comparisoninside Yao (Figure 3 and §4.1). Pretzel’s CPU time is lowerthan Baseline’s because in Pretzel, the provider decrypts aXPIR-BV ciphertext whereas in Baseline the provider decryptsa Paillier ciphertext (Figure 6).

As the number of features in an email increases(L={1000, 5000}), the provider’s CPU time in both Pretzeland Baseline does not change, as it is independent of L (un-like the client’s) while NoPriv’s increases since it is linear inL (see Figure 3). A particular point of interest is L=692 (theaverage number of features per email in the Gmail dataset),for which CPU time of Pretzel is 0.65× NoPriv’s.Client-side overheads. Figure 8 shows the size of the spammodel for the various systems. We notice that the modelin Pretzel is approximately 7× smaller than the model inBaseline. This is due to Pretzel’s “across row” packing tech-nique (§4.2). We also notice that, given its other refine-ments, Pretzel’s packing technique is essential (the Pretzel-NoOptimPack row in the figure).

In terms of client-side CPU time, Pretzel takes ≈ 358 ms toprocess an email with many features (L=5000) against a largemodel (N=5M). This time is dominated by the L left shift andadd operations in the secure dot product computation (§4.2).Our microbenchmarks (Figure 6) explain this number: 5000of the left shift and add operation takes 5000× 70µs=350ms.A large L is an unfavorable scenario for Pretzel: client-sideprocessing is proportional to L (Figure 3).Network transfers. Both Pretzel and Baseline add networkoverhead relative to NoPriv. It is, respectively, 19.6 KB and3.6 KB per email (or 26.1% and 4.8% of NoPriv, when consid-ering average email size as reported by [13]). These overheadsare due to transfer of a ciphertext, and a comparison insideYao’s framework (Figure 2). Pretzel’s overheads are higherthan Baseline’s because the XPIR-BV ciphertext in Pretzel ismuch larger than the Paillier ciphertext.Accuracy. Figure 9 shows Pretzel’s spam classification ac-curacy for the different classification algorithms it supports.(The figure also shows precision and recall. Higher precisionmeans lower false positives, or non-spam falsely classifiedas spam; higher recall means lower false negatives, or spam

10

Page 11: arXiv:1612.04265v3 [cs.CR] 27 Feb 2017 · End-to-end email encryption can shield email contents from prying eyes and reduce privacy loss when webmail ac-counts are hacked; and, while

Ling-spam Enron Gmail

Acc. Prec. Rec. Acc. Prec. Rec. Acc. Prec. Rec.

GR-NB 99.4 98.1 98.1 98.8 99.2 98.4 98.1 99.7 95.2LR 99.4 99.4 97.1 98.9 98.4 99.5 98.5 98.9 97.2

SVM 99.4 99.2 97.5 98.7 98.5 99.0 98.5 98.9 97.2GR 99.3 98.1 97.9 98.8 99.2 98.4 98.1 99.7 95.2

Figure 9—Accuracy (Acc.), precision (Prec.), and recall (Rec.) forspam filtering in Pretzel. Sets of columns correspond to the differentspam datasets, and the rows correspond to the classification algo-rithms Pretzel supports: GR-NB, binary LR, and two-class SVM(§3.1). Also shown is accuracy for Grahams-Robinson’s originalNaive Bayes algorithm (GR).

10-1

100

101

102

103

104

B=128 B=512 B=2048

CP

U t

ime

per

em

ail

(ms)

(

low

er i

s b

ette

r)

number of categories in topic extraction model

NoPriv

Baseline

Pretzel (B’=B)

Pretzel (B’=20)

Pretzel (B’=10)

Figure 10—Provider CPU time per email in milliseconds for topicextraction, varying the number of categories (B) in the model and thenumber of candidate topics (B′). The case B = B′ measures Pretzelwithout the decomposed classification technique (§4.3). The y-axisis log-scaled. N and L are set to 100K and 692 (average number offeatures per email in the authors’ Gmail dataset). The CPU times donot depend on N or L for Pretzel and Baseline; they increase linearlywith L and vary slightly with N for NoPriv.

falsely classified as non-spam.)

6.2 Topic extraction

This subsection reports the resource overheads (provider- andclient-side CPU time, network transfers, and client-side stor-age space use) and accuracy of topic extraction in Pretzel.

We experiment with N={20K, 100K})6 and B ={128, 512, 2048}. These parameters are based on the totalnumber of features in the topic extraction datasets we useand Google’s public list of topics (2208 topics [8]). For thenumber of candidate topics for Pretzel (§4.3), we experimentwith B′={5, 10, 20, 40}.Provider-side CPU time Figure 10 shows the per email CPUtime consumed by the provider.

Without decomposed classification (§4.3)—this is theB′ = B case in the figure—Pretzel’s CPU time is signifi-cantly higher than NoPriv’s but lower than Baseline’s. Thedifference between Pretzel and Baseline reflects two facts.First, XPIR-BV ciphertexts are cheaper to decrypt than Pail-lier ciphertexts. Second, there are fewer XPIR-BV ciphertexts,because each is larger, and can thus pack in more plaintextelements (§4.2).

6The number of features in topic extraction models are usually much lowerthan spam models because of many different word variations for spam, e.g.,FREE and FR33, etc.

network transfers

B=128 B=512 B=2048

Baseline 501.5 KB 2.0 MB 8.0 MBPretzel (B′=B) 516.6 KB 2.0 MB 8.0 MBPretzel (B′=20) 402.0 KB 402.0 KB 401.9 KBPretzel (B′=10) 201.0 KB 201.0 KB 201.2 KB

Figure 11—Network transfers per email for the topic extraction mod-ule in Pretzel and Baseline. B′ is the number of candidate topics indecomposed classification (§4.3). Network transfers are independentof the number of features in the model (N) and email (L) (Figure 3).

Size

N=20K N=100K

Non-encrypted 144.3 MB 769.4 MBBaseline 288.4 MB 1.5 GBPretzel 720.7 MB 3.8 GB

Figure 12—Size of topic extraction models for the various systems.N is the number of features in the model. B is set to 2048.

With decomposed classification, the number of compar-isons inside Yao’s framework come down and, as expected,the difference between CPU times in Pretzel and NoPrivdrops (§4.3). For B=2048, B′=20, Pretzel’s CPU time is1.78× NoPriv’s; for B=2048, B′=10, it is 1.03× NoPriv’s.

Network transfers. Figure 11 shows the network trans-fers per email for Baseline and Pretzel. As expected, withdecomposed classification, Pretzel’s network transfers arelower; they are 402 KB per email, or 5.4× the averageemail size of 74KB, as reported in [13] for B=2048, B′=20,and 201 KB per email (or 2.7× the average email size) forB=2048, B′=10.

Client-side overheads. Fig. 12 shows the model sizes (be-fore feature selection §4.3) for the various systems for dif-ferent values of N and B=2048. Pretzel’s model is biggerthan Baseline’s for two reasons. First, its model comprises anencrypted part that comes from the provider and a public part.Second, the XPIR-BV ciphertexts used in the encrypted parthave a larger expansion factor (by a factor of 2) than Paillierciphertexts.

In terms of client-side CPU time, as in spam filtering, Pret-zel (with or without pruning) takes less than half a second toprocess an email with many features (L=5000).

Loss of accuracy. Recall that classification accuracy fortopic extraction in Pretzel could be affected by feature selec-tion and decomposed classification (§4.3). Figure 13 showsclassification accuracy for topic extraction classifiers trainedwith and without feature selection, and while varying thedegree of feature selection (using the Chi-square selectiontechnique [111]). It appears that even after a high degree offeature selection, accuracy drops only modestly below itspeak point. (This would reduce the client-side storage costpresented in Figure 12.)

Figure 14 shows the variation in classification accuracydue to decomposed classification (§4.3). (The depicted dataare for the RCV1 dataset and NB classifier; the qualitativeresults are similar for the other datasets and classifiers.) The

11

Page 12: arXiv:1612.04265v3 [cs.CR] 27 Feb 2017 · End-to-end email encryption can shield email contents from prying eyes and reduce privacy loss when webmail ac-counts are hacked; and, while

50

55

60

65

70

75

80

85

90

95

100

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Cla

ssif

icat

ion

acc

ura

cy (

per

cen

tag

e)

Fraction of total # of features (N’/N)

LR-REUSVM-REU

NB-REU

LR-RCVSVM-RCV

NB-RCV

LR-20NSVM-20N

NB-20N

Figure 13—Classification accuracy of topic extraction classifiersin Pretzel as a function of the degree of feature selection N′/N(§4.3). (N is the total number of features in the training part of thedatasets and N′ are the number of selected features.) (The 20Newsand Reuters datasets come pre-split into training and testing parts:60%/40% and 75%/25% for the two respectively; we randomly splitRCV1 into 70%/30% training/testing portions.) Pretzel can operateat a point where number of features selected N′ are roughly 25% ofN; this would result in only a marginal drop in accuracy.

Percentage of the total training dataset

1% 2% 5% 10%

B′=5 79.6 84.0 90.1 94.0B′=10 89.6 92.1 95.6 97.7B′=20 95.9 97.3 98.5 99.3B′=40 98.7 99.3 99.8 99.9

Figure 14—Impact of decomposed classification (§4.3) on classifi-cation accuracy for the RCV1 dataset with 296 topics. The columns(except the first) correspond to the percentage of the total trainingdataset used to train the (public) model that extracts candidate topics.The rows correspond to the number of candidate topics (B′). Thecells contain the percentage of test documents for which the “truecategory” (according to a classifier trained on the entire trainingdataset) is contained in the candidate topics. Higher percentage isbetter; 100% is ideal.

data suggest that an effective non-proprietary classifier canbe trained using a small fraction of training data, for only asmall loss in end-to-end accuracy.

6.3 Keyword search and absolute costs

Figure 15 shows the client-side storage and CPU costs ofPretzel’s keyword search module (§5).

We now consider the preceding costs in absolute terms,with a view toward whether they would be acceptable in adeployment. We consider an average user who receives 150emails daily [119] of average size (75KB) [13], and owns amobile device with 32 GB of storage.

To spam filter a long email, the client takes 358 ms, whichwould be less than a minute daily. As for the encrypted model,one with 5M features occupies 183.5 MB or 0.5% of thedevice’s storage. For network overheads, each email transfersan extra 19.33 KB, which is 2.8 MB daily.

For topic extraction, the client uses less than half a secondof CPU per email (or less than 75s daily); a model with 2048categories (close to Google’s) and 20K features occupies

index size query time update time

Ling-spam 5.2 MB 0.32 ms 0.18 msEnron 27.2 MB 0.49 ms 0.1 ms20 Newsgroup 23.9 MB 0.3 ms 0.12 msReuters-21578 6.0 MB 0.28 ms 0.06 msGmail Inbox (40K emails) 50.4 MB 0.13 ms 0.12 ms

Figure 15—Client-side search index sizes, CPU times to query akeyword in the indexes (i.e., retrieve a list of emails that contain akeyword), and CPU times to index a new email.

720.7MB or 2.2% of the device’s storage (this can be reducedfurther using feature selection); and, a B′=20 pruning schemetransfers an extra 59 MB (5.4 times the size of the emails)over the network daily.

Overall, these costs are certainly substantial—and we don’tmean to diminish that issue—but we believe that the magni-tudes in question are still within tolerance for most users.

7 Discussion, limitations, and future workPretzel is an improvement over its baseline (§3.3) of up to100×, depending on the resource (§6). Its absolute overheadsare substantial but, as just discussed (§6.3), are within therealm of plausibility.

Pretzel’s prototype has many limitations. It does not handleservices beyond what we presented; extending Pretzel to otherfunctions (predictive personal assistance, virus scanning, etc.)is future work. So is extending Pretzel to hide metadata, per-haps by composing with anonymity systems and other securecomputation primitives.

A fundamental limitation of Pretzel is information leak-age (§2.1, §4.4). As discussed in Section 4.4, the principalconcerns surround the privacy of providers’ models (for spam)and clients’ emails (during topic extraction). While we don’thave technical defenses, we note that participation in topicextraction is voluntary for the client; in fact, a client can optout with plausible deniability (§4.4). Moreover, providerscannot expose all or even a substantial fraction of clients thisway, as that would forfeit the original purpose of topic extrac-tion. Nevertheless, we are aware that, defaults being defaults,most clients would not opt out, which means that particularclients could indeed be targeted by a sufficiently motivatedadversarial provider.

If Pretzel were widely deployed, we would need a way toderive and retrain models. This is a separate problem, withexisting research [53, 54, 76, 110, 115, 116, 118, 127, 129–131, 136–138]; an avenue of future work for Pretzel wouldbe to incorporate these works.

There are many other obstacles between the status quo anddefault end-to-end encryption. In general, it’s hard to modifya communication medium as entrenched as email [52]. On theother hand, there is reason for hope: TLS between data centerswas deployed over just several years [50]. Another obstacleis key management and usability: how do users share keysacross devices and find each other’s keys? This too is difficult,but there is recent research and commercial attention [2, 90,120]. Finally, politics: there are entrenched interests who

12

Page 13: arXiv:1612.04265v3 [cs.CR] 27 Feb 2017 · End-to-end email encryption can shield email contents from prying eyes and reduce privacy loss when webmail ac-counts are hacked; and, while

would prefer email not to be encrypted.Ultimately, our goal is just to demonstrate an alterna-

tive. We don’t claim that Pretzel is an optimal point in thethree-way tradeoff among functionality, performance, and pri-vacy (§2.1); we don’t yet know what such an optimum wouldbe. We simply claim that it is different from the status quo(which combines rich functionality, superb performance, butno encryption by default) and that it is potentially plausible.

A Some details of Naive BayesA.1 Spam classification

Section 3.1 stated that the GR-NB algorithm labels a docu-ment (email) as spam if α =1/p(spam |~x)− 1 is greater thansome fixed threshold, and that logα can be expressed as(

i=N∑i=1

xi · log p(ti |C2)

)+ 1 · log p(C2)

(i=N∑i=1

xi · log p(ti |C1)

)+ 1 · log p(C1), (3)

where the spam category is represented by C1 and non-spamis C2. Here we show this derivation.

From Bayes Rule:

p(C1 |~x) =p(~x |C1) · p(C1)

p(~x).

Rewriting:

1p(C1 |~x)

− 1 =p(~x)

p(~x |C1) · p(C1)− 1.

Expanding p(~x) to p(~x |C1) · p(C1) + p(~x |C2) · p(C2):

=p(~x |C2) · p(C2)

p(~x |C1) · p(C1). (4)

The Naive Bayes assumption says that the presence (or ab-sence) of a feature is independent of the other features, whichmeans

p(~x |Cj) =

i=N∏i=1

p(ti |Cj)xi .

Plugging the above equation into equation (4):

α =

∏i=Ni=1 p(ti |C2)

xi · p(C2)∏i=Ni=1 p(ti |C1)xi · p(C1)

.

In log space, the expression above is equation (3).

A.2 Multinomial text classification

Section 3.1 stated that the multinomial NB algorithm identi-fies the category Cj∗ that maximizes j∗ = argmaxj p(Cj |~x).The section stated that the maximal p(Cj |~x) can be deter-mined by examining(

i=N∑i=1

xi · log p(ti |Cj)

)+ 1 · log p(Cj),

for each j. Here we give a derivation of this fact.From Bayes Rule:

p(Cj |~x) =p(~x |Cj) · p(Cj)

p(~x). (5)

The multinomial Naive Bayes algorithm assumes that a doc-ument, email, or feature vector~x is formed by picking, withreplacement, L=

∑i=Ni=1 xi features from the set of N features

in the model. Then, p(~x |Cj) is modeled as a multinomialdistribution:

p(~x |Cj) =p(L) · L!∏i=N

i=1 xi!·

i=N∏i=1

p(ti |Cj)xi .

Plugging the above equation into Equation (5):

=1

p(~x)· p(L) · L!∏i=N

i=1 xi!·

i=N∏i=1

p(ti |Cj)xi · p(Cj)

Discounting the terms that are independent of j (which we arefree to do because the maximum is unaffected by such terms),the expression above becomes:

i=N∏i=1

p(ti |Cj)xi · p(Cj),

which, when converted to log space, is the claimed expression.

References[1] https://www.gnupg.org.[2] https://keybase.io.[3] http://spamprobe.sourceforge.net/.[4] http://spambayes.sourceforge.net/.[5] http://spamassassin.apache.org/.[6] http://scikit-learn.org/stable/.[7] http://www.cs.waikato.ac.nz/ml/weka/.[8] https:

//support.google.com/ads/answer/2842480?hl=en.[9] https://www.sqlite.org/fts3.html.

[10] https://www.cs.cmu.edu/~./enron/.[11] http://qwone.com/~jason/20Newsgroups/.[12] http://www.daviddlewis.com/resources/

testcollections/reuters21578/.[13] http:

//email.about.com/od/emailstatistics/f/What_is_the_Average_Size_of_an_Email_Message.htm.

[14] http://www.gossamer-threads.com/lists/spamassassin/users/151578.

[15] http://users.spamassassin.apache.narkive.com/d6ppUDfw/large-scale-global-bayes-tuning.

[16] http://spamassassin.apache.org/full/3.4.x/doc/Mail_SpamAssassin_Conf.html.

[17] A survey on Ring-LWE cryptography, Feb. 2016.https://www.microsoft.com/en-us/research/video/a-survey-on-ring-lwe-cryptography/.

[18] D. Aberdeen, O. Pacovsky, and A. Slater. The learning behindgmail priority inbox. In NIPS Workshop on Learning onCores, Clusters and Clouds (LCCC), 2010.

13

Page 14: arXiv:1612.04265v3 [cs.CR] 27 Feb 2017 · End-to-end email encryption can shield email contents from prying eyes and reduce privacy loss when webmail ac-counts are hacked; and, while

[19] P. Aditya, R. Sen, P. Druschel, S. J. Oh, R. Benenson,M. Fritz, B. Schiele, B. Bhattacharjee, and T. T. Wu. I-Pic: Aplatform for privacy-compliant image capture. InInternational Conference on Mobile Systems, Applications,and Services (MobiSys), 2016.

[20] C. Aguilar-Melchor, J. Barrier, L. Fousse, and M.-O. Killijian.XPIR: Private Information Retrieval for Everyone. In PrivacyEnhancing Technologies Symposium (PETS), 2016.

[21] A. Amirbekyan and V. Estivill-Castro. A new efficientprivacy-preserving scalar product protocol. In Australasianconference on Data mining and analytics (AusDM), 2007.

[22] I. Androutsopoulos, J. Koutsias, K. Chandrinos, G. Paliouras,and C. Spyropoulos. An evaluation of Naive Bayesiananti-spam filtering. In Workshop on Machine Learning in theNew Information Age, 2000.

[23] Apple. Privacy - Approach to Privacy - Apple. http://www.apple.com/privacy/approach-to-privacy/.

[24] M. L. G. at National Taiwan University. LIBLINEAR–alibrary for large linear classification.https://www.csie.ntu.edu.tw/~cjlin/liblinear/.

[25] M. J. Atallah and W. Du. Secure multi-party computationalgeometry. In Algorithms and Data Structures, pages 165–179.Springer, 2001.

[26] D. Beeby. Rogue tax workers snooped on ex-spouses, familymembers. Toronto Star, June 2010.https://www.thestar.com/news/canada/2010/06/20/rogue_tax_workers_snooped_on_exspouses_

family_members.html.[27] E. Betters. What is Google Assistant, how does it work, and

when can you use it?, Sept. 2016. http://www.pocket-lint.com/news/137722-what-is-google-assistant-how-does-it-work-and-when-can-you-use-it.

[28] M. Blanton and P. Gasti. Secure and efficient protocols foriris and fingerprint identification. In European Symposium onResearch in Computer Security (ESORICS), pages 190–209,2011.

[29] B. E. Boser, I. M. Guyon, and V. N. Vapnik. A trainingalgorithm for optimal margin classifiers. In Workshop onComputational Learning Theory (COLT), pages 144–152,1992.

[30] R. Bost, R. A. Popa, S. Tu, and S. Goldwasser. Machinelearning classification over encrypted data. In Network andDistributed System Security Symposium (NDSS), 2014.

[31] Z. Brakerski and V. Vaikuntanathan. Fully homomorphicencryption from ring-LWE and security for key dependentmessages. In Advances in Cryptology—CRYPTO, pages505–524. Springer, 2011.

[32] Z. Brakerski and V. Vaikuntanathan. Fully homomorphicencryption from ring-LWE and security for key dependentmessages. In Advances in Cryptology—CRYPTO, 2011.

[33] J. Bringer, O. El Omri, C. Morel, and H. Chabanne. BoostingGSHADE capabilities: New applications and security inmalicious setting. In Symposium on Access Control Modelsand Technologies (SACMAT), pages 203–214, 2016.

[34] D. Cash, J. Jaeger, S. Jarecki, C. Jutla, H. Krawczyk, M.-C.Rosu, and M. Steiner. Dynamic searchable encryption invery-large databases: Data structures and implementation. InNetwork and Distributed System Security Symposium (NDSS),2014.

[35] D. Cash, S. Jarecki, C. Jutla, H. Krawczyk, M. Rosu, andM. Steiner. Highly-scalable searchable symmetric encryptionwith support for boolean queries. In Advances inCryptology—CRYPTO, 2013.

[36] Y.-T. Chiang, D.-W. Wang, C.-J. Liau, and T.-s. Hsu. Secrecyof two-party secure computation. In Working Conference onData and Applications Security (DBSec), pages 114–123.Springer, 2005.

[37] K. Conger. Google engineer says he’ll push for defaultend-to-end encryption in Allo, May 2016.https://techcrunch.com/2016/05/19/google-engineer-says-hell-push-for-default-end-to-

end-encryption-in-allo/.[38] K. Conger. Google’s Allo won’t include end-to-end

encryption by default, May 2016. https://techcrunch.com/2016/05/18/googles-allo-wont-include-end-to-end-encryption-by-default/.

[39] G. V. Cormack. TREC 2007 spam track overview. In TextREtrieval Conference (TREC), 2007.

[40] C. Cortes and V. Vapnik. Support-vector networks. MachineLearning, 20(3):273–297, 1995.

[41] T. Dierks and E. Rescorla. The transport layer security (TLS)protocol version 1.2. RFC 5246, Network Working Group,Aug. 2008.

[42] W. Diffie and M. Hellman. New directions in cryptography.IEEE transactions on Information Theory, 22(6):644–654,1976.

[43] C. Dong and L. Chen. A fast secure dot product protocol withapplication to privacy preserving association rule mining. InPacific-Asia Conference on Knowledge Discovery and DataMining (PAKDD), 2014.

[44] W. Du and M. J. Atallah. Privacy-preserving cooperativescientific computations. In Computer Security FoundationsWorkshop (CSFW), 2001.

[45] W. Du and M. J. Atallah. Privacy-preserving cooperativestatistical analysis. In Annual Computer SecurityApplications Conference (ACSAC), 2001.

[46] W. Du and M. J. Atallah. Protocols for secure remotedatabase access with approximate matching. In E-CommerceSecurity and Privacy, 2001.

[47] W. Du and Z. Zhan. Building decision tree classifier onprivate data. In International Conference on Data MiningWorkshop on Privacy, Security and Data Mining (PSDM),2002.

[48] W. Du and Z. Zhan. A practical approach to solve securemulti-party computation problems. In New securityparadigms workshop (NSPW), 2002.

[49] T. Duong. Security and privacy in Google Allo, May 2016.https://vnhacker.blogspot.com/2016/05/security-and-privacy-in-google-allo.html.

[50] Z. Durumeric, D. Adrian, A. Mirian, J. Kasten, E. Bursztein,N. Lidzborski, K. Thomas, V. Eranti, M. Bailey, and J. A.Halderman. Neither snow nor rain nor MITM...: An empiricalanalysis of email delivery security. In ACM SIGCOMMConference on Internet Measurement (IMC), 2015.

[51] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J.Lin. LIBLINEAR: A library for large linear classification.Journal of machine learning research, 9(Aug):1871–1874,2008.

[52] L. Franceschi-Bicchierai. Even the inventor of PGP doesn’tuse PGP, 2015. http://motherboard.vice.com/read/even-the-inventor-of-pgp-doesnt-use-pgp.

[53] A. Gangrade and R. Patel. Privacy preserving Naıve Bayesclassifier for horizontally distribution scenario usingun-trusted third party. IOSR Journal of ComputerEngineering (IOSRJCE), pages 4–12, 2012.

[54] A. Gangrade and R. Patel. Privacy preserving three-layer

14

Page 15: arXiv:1612.04265v3 [cs.CR] 27 Feb 2017 · End-to-end email encryption can shield email contents from prying eyes and reduce privacy loss when webmail ac-counts are hacked; and, while

Naıve Bayes classifier for vertically partitioned databases.Journal of Information and Computing Science, pages119–129, 2013.

[55] B. Goethals, S. Laur, H. Lipmaa, and T. Mielikainen. Onprivate scalar product computation for privacy-preservingdata mining. In International Conference on InformationSecurity and Cryptology (ICISC). 2004.

[56] O. Goldreich, S. Micali, and A. Wigderson. How to play anymental game. In ACM Symposium on Theory of Computing(STOC), 1987.

[57] J. Goodman and W.-t. Yih. Online discriminative spam filtertraining. In CEAS, pages 1–4, 2006.

[58] Google. https://github.com/google/end-to-end.[59] Google. Google transparency report.

https://www.google.com/transparencyreport/userdatarequests/US/.

[60] Google. How Gmail ads work. https://support.google.com/mail/answer/6603?hl=en.

[61] S. D. Gordon, J. Katz, V. Kolesnikov, F. Krell, T. Malkin,M. Raykova, and Y. Vahlis. Secure two-party computation insublinear (amortized) time. In ACM Conference on Computerand Communications Security (CCS), 2012.

[62] J. Gould. The natural history of Gmail data mining. Gmailisn’t really about email—it’s a gigantic profiling machine.Medium, June 2014.https://medium.com/@jeffgould/the-natural-

history-of-gmail-data-mining-be115d196b10.[63] P. Graham. A plan for spam, 2002.

http://www.paulgraham.com/spam.html.[64] P. Graham. Better Bayesian filtering, 2003.

http://www.paulgraham.com/better.html.[65] J. Huang, J. Lu, and C. X. Ling. Comparing Naive Bayes,

decision trees, and SVM with AUC and accuracy. In IEEEInternational Conference on Data Mining (ICDM), pages553–556, 2003.

[66] Y. Huang, D. Evans, J. Katz, and L. Malka. Faster securetwo-party computation using garbled circuits. In USENIXSecurity Symposium (SEC), 2011.

[67] Y. Huang, J. Katz, and D. Evans. Quid-pro-quo-tocols:Strengthening semi-honest protocols with dual execution. InIEEE Symposium on Security and Privacy (S&P), 2012.

[68] Y. Huang, Z. Lu, et al. Privacy preserving association rulemining with scalar product. In International Conference onNatural Language Processing and Knowledge Engineering(NLP-KE), 2005.

[69] Y. Huang, L. Malka, D. Evans, and J. Katz. Efficientprivacy-preserving biometric identification. In Network andDistributed System Security Symposium (NDSS), 2011.

[70] I. Ioannidis, A. Grama, and M. Atallah. A secure protocol forcomputing dot-products in clustered and distributedenvironments. In International Conference on ParallelProcessing (ICPP), 2002.

[71] Y. Ishai, J. Kilian, K. Nissim, and E. Petrank. Extendingoblivious transfers efficiently. In Advances inCryptology-CRYPTO 2003, pages 145–161. Springer, 2003.

[72] S. Jarecki, C. Jutla, H. Krawczyk, M. Rosu, and M. Steiner.Outsourced symmetric private information retrieval. In ACMConference on Computer and Communications Security(CCS), 2013.

[73] T. Joachims. Text categorization with support vectormachines: Learning with many relevant features. In Europeanconference on machine learning, pages 137–142, 1998.

[74] C. Kaleli and H. Polat. Providing Naıve Bayesian

classifier-based private recommendations on partitioned data.In European Conference on Principles and Practice ofKnowledge Discovery in Databases (PKDD), pages 515–522,2007.

[75] J.-S. Kang and D. Hong. On fast private scalar productprotocols. In Security Technology (SecTech), pages 1–10.Springer, 2011.

[76] M. Kantarcıoglu, J. Vaidya, and C. Clifton. Privacypreserving Naive Bayes classifier for horizontally partitioneddata. In ICDM workshop on privacy preserving data mining(PPDM), 2003.

[77] M. Keller, E. Orsini, and P. Scholl. Actively secure OTextension with optimal overhead. In Advances inCryptology—CRYPTO, pages 724–741, 2015.

[78] A. Khedr, G. Gulak, and V. Vaikuntanathan. SHIELD:scalable homomorphic implementation of encrypteddata-classifiers. IEEE Transactions on Computers,65(9):2848–2858, 2014.

[79] B. Kreuter, A. Shelat, and C.-H. Shen. Billion-gate securecomputation with malicious adversaries. In USENIX SecuritySymposium (SEC), pages 285–300, 2012.

[80] S. Laur and H. Lipmaa. On private similarity search protocols.In Nordic Workshop on Secure IT Systems (NordSec 2004),2004.

[81] D. D. Lewis, Y. Yang, T. G. Rose, and F. Li. RCV1: A newbenchmark collection for text categorization research.Journal of machine learning research, 5(Apr):361–397, 2004.

[82] C.-J. Lin, R. C. Weng, and S. S. Keerthi. Trust region newtonmethod for logistic regression. Journal of Machine LearningResearch, 9(Apr):627–650, 2008.

[83] Y. Lindell. Fast cut-and-choose-based protocols for maliciousand covert adversaries. Journal of Cryptology,29(2):456–490, 2016.

[84] Y. Lindell and B. Pinkas. A proof of security of Yao’sprotocol for two-party computation. Journal of Cryptology,22(2):161–188, 2009.

[85] X. Liu, R. Lu, J. Ma, L. Chen, and B. Qin. Privacy-preservingpatient-centric clinical decision support system on NaiveBayesian classification. IEEE journal of biomedical andhealth informatics, 20(2):655–668, 2016.

[86] Z. Liu, H. Seo, S. S. Roy, J. Großschadl, H. Kim, andI. Verbauwhede. Efficient Ring-LWE encryption on 8-bitAVR processors. In International Workshop onCryptographic Hardware and Embedded Systems (CHES),pages 663–682, 2015.

[87] V. Lyubashevsky, C. Peikert, and O. Regev. On ideal latticesand learning with errors over rings. In InternationalConference on the Theory and Applications of CryptographicTechniques, pages 1–23. Springer, 2010.

[88] V. Lyubashevsky, C. Peikert, and O. Regev. A toolkit forring-LWE cryptography. In Annual International Conferenceon the Theory and Applications of Cryptographic Techniques,pages 35–54. Springer, 2013.

[89] A. McCallum, K. Nigam, et al. A comparison of eventmodels for Naive Bayes text classification. In AAAI workshopon learning for text categorization, volume 752, pages 41–48,1998.

[90] M. S. Melara, A. Blankstein, J. Bonneau, E. W. Felten, andM. J. Freedman. CONIKS: Bringing key transparency to endusers. In USENIX Security Symposium (SEC), pages 383–398,2015.

[91] R. C. Merkle. Secure communications over insecure channels.Commun. ACM, 21(4):294–299, Apr. 1978.

15

Page 16: arXiv:1612.04265v3 [cs.CR] 27 Feb 2017 · End-to-end email encryption can shield email contents from prying eyes and reduce privacy loss when webmail ac-counts are hacked; and, while

[92] V. Metsis, I. Androutsopoulos, and G. Paliouras. Spamfiltering with Naive Bayes–which Naive Bayes? In CEAS,pages 27–28, 2006.

[93] T. Meyer. No warrant, no problem: How the government canget your digital data. ProPublica, June 2014.https://www.propublica.org/special/no-warrant-no-problem-how-the-government-can-still-get-

your-digital-data/.[94] Microsoft. Law enforcement requests report.

https://www.microsoft.com/about/csr/transparencyhub/lerr/.

[95] A. Y. Ng and M. I. Jordan. On discriminative vs. generativeclassifiers: A comparison of logistic regression and naivebayes. Advances in neural information processing systems,2:841–848, 2002.

[96] P. Paillier. Public-key cryptosystems based on compositedegree residuosity classes. In Annual InternationalConference on the Theory and Applications of CryptographicTechniques (EUROCRYPT), 1999.

[97] V. Pappas, F. Krell, B. Vo, V. Kolesnikov, T. Malkin, S. G.Choi, W. George, A. Keromytis, and S. Bellovin. Blind Seer:A scalable private DBMS. In IEEE Symposium on Securityand Privacy (S&P), 2014.

[98] M. A. Pathak, M. Sharifi, and B. Raj. Privacy preservingspam filtering. arXiv preprint arXiv:1102.4021, 2011.

[99] K. Poulsen. Five IRS employees charged with snooping ontax returns. Wired, May 2008.https://www.wired.com/2008/05/five-irs-employ/.

[100] Electronic Communication Privacy Act of 1986, Oct. 1986.https://www.gpo.gov/fdsys/pkg/STATUTE-100/pdf/STATUTE-100-Pg1848.pdf.

[101] G. Robinson. A statistical approach to the spam problem.Linux Journal, Mar. 2003.http://www.linuxjournal.com/article/6467.

[102] A.-R. Sadeghi, T. Schneider, and I. Wehrenberg. Efficientprivacy-preserving face recognition. In InternationalConference on Information Security and Cryptology (ICISC),pages 229–244. Springer, 2009.

[103] A.-R. Sadeghi, T. Schneider, and I. Wehrenberg. Efficientprivacy-preserving face recognition (full version). CryptologyePrint Archive, Report 2009/507, 2009.

[104] D. Sculley and G. M. Wachman. Relaxed online svms forspam filtering. In ACM SIGIR conference on Research anddevelopment in information retrieval, pages 415–422, 2007.

[105] M. Shaneck and Y. Kim. Efficient cryptographic primitivesfor private data mining. In Hawaii International Conferenceon System Sciences (HICSS), 2010.

[106] C. Soghoian. Two honest Google employees: our productsdon’t protect your privacy, Nov. 2011.http://paranoia.dubfire.net/2011/11/two-honest-google-employees-our.html.

[107] S. Somogyi. Making end-to-end encryption easier to use.Google Security Blog, June 2014.https://security.googleblog.com/2014/06/making-end-to-end-encryption-easier-to.html.

[108] E. M. Songhori, S. U. Hussain, A.-R. Sadeghi, T. Schneider,and F. Koushanfar. TinyGarble: Highly compressed andscalable sequential garbled circuits. In IEEE Symposium onSecurity and Privacy (S&P), pages 411–428, 2015.

[109] A. Stamos. User-focused security: End-to-end encryptionextension for Yahoo Mail. Yahoo Tumblr Blog, Mar. 2015.https://yahoo.tumblr.com/post/113708033335/user-focused-security-end-to-end-encryption.

[110] M. Sumana and K. Hareesha. Privacy preserving Naive Bayesclassifier for horizontally partitioned data using securedivision. International Journal of Network Security & ItsApplications, 6(6):17, 2014.

[111] B. Tang, S. Kay, and H. He. Toward optimal feature selectionin Naive Bayes for text categorization. IEEE Transactions onKnowledge and Data Engineering, 28(9):2508–2521, 2016.

[112] F. Tramer, F. Zhang, A. Juels, M. K. Reiter, and T. Ristenpart.Stealing machine learning models via prediction APIs. InUSENIX Security Symposium (SEC), 2016.

[113] D. Trinca and S. Rajasekaran. Fast cryptographic multi-partyprotocols for computing boolean scalar products withapplications to privacy-preserving association rule mining invertically partitioned data. In Data Warehousing andKnowledge Discovery (DaWaK), pages 418–427. 2007.

[114] J. Vaidya and C. Clifton. Privacy preserving association rulemining in vertically partitioned data. In ACM SIGKDDConference on Knowledge Discovery and Data Mining(KDD), 2002.

[115] J. Vaidya and C. Clifton. Privacy preserving Naıve Bayesclassifier for vertically partitioned data. In SIAMInternational Conference on Data Mining (SDM), 2004.

[116] J. Vaidya, M. Kantarcıoglu, and C. Clifton.Privacy-preserving Naive Bayes classification. The VLDBJournal, 17(4):879–898, 2008.

[117] J. Vaidya, B. Shafiq, A. Basu, and Y. Hong. Differentiallyprivate Naive Bayes classification. In IEEE/WIC/ACMInternational Joint Conferences on Web Intelligence (WI) andIntelligent Agent Technologies (IAT), pages 571–576, 2013.

[118] J. Vaidya, H. Yu, and X. Jiang. Privacy-preserving SVMclassification. Knowledge and Information Systems,14(2):161–178, 2008.

[119] L. Vanderkam. Stop checking your email, now. Fortune, Oct.2012. http://fortune.com/2012/10/08/stop-checking-your-email-now/.

[120] WhatsApp. WhatsApp FAQ - End-to-End Encryption.https:

//www.whatsapp.com/faq/en/general/28030015.[121] Wikipedia. 2016 Democratic National Committee email

leak—Wikipedia, The Free Encyclopedia, 2014.https://en.wikipedia.org/wiki/2016_Democratic_National_Committee_email_leak.

[122] Wikipedia. Sony pictures entertainment hack—Wikipedia,The Free Encyclopedia, 2014.https://en.wikipedia.org/wiki/Sony_Pictures_Entertainment_hack.

[123] R. Wright and Z. Yang. Privacy-preserving Bayesian networkstructure computation on distributed heterogeneous data. InACM SIGKDD Conference on Knowledge Discovery andData Mining (KDD), 2004.

[124] F. Xu, S. Zeng, S. Luo, C. Wang, Y. Xin, and Y. Guo.Research on secure scalar product protocol and its’application. In International Conference on WirelessCommunications Networking and Mobile Computing(WiCOM), 2010.

[125] Yahoo! https://github.com/yahoo/end-to-end.[126] Yahoo! Transparency report: Overview.

https://transparency.yahoo.com/.[127] Z. Yang, S. Zhong, and R. N. Wright. Privacy-preserving

classification of customer data without loss of accuracy. InSIAM International Conference on Data Mining (SDM),pages 92–102. SIAM, 2005.

[128] A. C. Yao. Protocols for secure computations. In Symposium

16

Page 17: arXiv:1612.04265v3 [cs.CR] 27 Feb 2017 · End-to-end email encryption can shield email contents from prying eyes and reduce privacy loss when webmail ac-counts are hacked; and, while

on Foundations of Computer Science (SFCS), 1982.[129] X. Yi and Y. Zhang. Privacy-preserving Naive Bayes

classification on distributed data via semi-trusted mixers.Information Systems, 34(3):371–380, 2009.

[130] H. Yu, X. Jiang, and J. Vaidya. Privacy-preserving SVM usingnonlinear kernels on horizontally partitioned data. In ACMsymposium on Applied computing, pages 603–610, 2006.

[131] H. Yu, J. Vaidya, and X. Jiang. Privacy-preserving SVMclassification on vertically partitioned data. In Pacific-AsiaConference on Knowledge Discovery and Data Mining, pages647–656, 2006.

[132] S. Zahur and D. Evans. Obliv-C: A language for extensibledata-oblivious computation. Cryptology ePrint Archive,Report 2015/1153, 2015.

[133] S. Zahur, M. Rosulek, and D. Evans. Two halves make awhole. In Annual International Conference on the Theory andApplications of Cryptographic Techniques (EUROCRYPT).2015.

[134] S. Zahur, X. Wang, M. Raykova, A. Gascon, J. Doerner,D. Evans, and J. Katz. Revisiting square-root ORAM efficientrandom access in multi-party computation. In IEEESymposium on Security and Privacy (S&P), 2016.

[135] K. Zetter. Ex-Googler allegedly spied on user e-mails, chats,Sept. 2010.https://www.wired.com/2010/09/google-spy/.

[136] J. Zhan, S. Matwin, and L. Chang. Privacy-preserving NaiveBayesian classification over vertically partitioned data. InIEEE ICDM Workshop on Foundations of Semantic OrientedData and Web Mining, pages 27–30. Citeseer, 2005.

[137] J. Zhan, S. Matwin, and L. W. Chang. Privacy-preservingNaive Bayesian classification over horizontally partitioneddata. In Data Mining: Foundations and Practice, pages529–538. Springer, 2008.

[138] Z. Zhan, L. Chang, and S. Matwin. Privacy-preserving NaiveBayesian classification. Technical report, DTIC Document,2004.

[139] H. Zhang. The optimality of Naive Bayes. AA, 1(2):3, 2004.[140] P. Zhang, Y. Tong, S. Tang, and D. Yang. Privacy preserving

Naive Bayes classification. In Advanced Data Mining andApplications, pages 744–752. Springer, 2005.

[141] Y. Zhu and T. Takagi. Efficient scalar product protocol and itsprivacy-preserving application. International Journal ofElectronic Security and Digital Forensics (IJESDF),7(1):1–19, 2015.

[142] Y. Zhu, T. Takagi, and L. Huang. Efficient secure primitivefor privacy preserving distributed computations. In Advancesin Information and Computer Security: InternationalWorkshop on Security (IWSEC). 2012.

[143] Y. Zhu, Z. Wang, B. Hassan, Y. Zhang, J. Wang, and C. Qian.Fast secure scalar product protocol with (almost) optimalefficiency. In Collaborative Computing: Networking,Applications, and Worksharing (CollaborateCom). 2015.

[144] P. R. Zimmermann. The official PGP user’s guide. MIT press,1995.

17