swift: super-fast and robust privacy-preserving machine ...nishat koti , mahak pancholi , arpita...

29
SWIFT: Super-fast and Robust Privacy-Preserving Machine Learning Nishat Koti * , Mahak Pancholi * , Arpita Patra * , Ajith Suresh * * Department of Computer Science and Automation, Indian Institute of Science, Bangalore India {kotis, mahakp, arpita, ajith}@iisc.ac.in Abstract—Performing ML computation on private data while maintaining data privacy aka Privacy-preserving Machine Learn- ing (PPML) is an emergent field of research. Recently, PPML has seen a visible shift towards the adoption of Secure Outsourced Computation (SOC) paradigm, due to the heavy computation that it entails. In the SOC paradigm, computation is outsourced to a set of powerful and specially equipped servers that provide service on a pay-per-use basis. In this work, we propose SWIFT, a robust PPML framework for a range of ML algorithms in SOC setting, that guarantees output delivery to the users irrespective of any adversarial behaviour. Robustness, a highly desirable feature, evokes user participation without the fear of denial of service. At the heart of our framework lies a highly-efficient, maliciously-secure, three-party computation (3PC) over rings that provides guaranteed output delivery (GOD) in the honest- majority setting. To the best of our knowledge, SWIFT is the first robust and efficient PPML framework in the 3PC setting. SWIFT is as fast as the best-known 3PC framework BLAZE (Patra et al. NDSS’20) which only achieves fairness 1 . We extend our 3PC framework for four parties (4PC). In this regime, SWIFT is as fast as the best known fair 4PC framework Trident (Chaudhari et al. NDSS’20) and twice faster than the best-known robust 4PC framework FLASH (Byali et al. PETS’20). We demonstrate the practical relevance of our framework by benchmarking two important applications– i) ML algorithms: Logistic Regression and Neural Network, and ii) Biometric matching, both over a 64-bit ring in WAN setting. Our readings reflect our claims as above. I. I NTRODUCTION Privacy Preserving Machine Learning (PPML), a booming field of research, allows Machine Learning (ML) computa- tions over private data of users while ensuring the privacy of the data. PPML finds applications in sectors that deal with sensitive/confidential data, e.g. healthcare, finance, and in cases where organisations are prohibited from sharing client information due to privacy laws such as CCPA and GDPR. However, PPML solutions make the already computationally heavy ML algorithms more compute-intensive. An average end-user who lacks the infrastructure required to run these tasks prefers to outsource the computation to a powerful set of specialized cloud servers and leverage their services on a pay-per-use basis. The Secure Outsourced Computation (SOC) paradigm is thus an apt fit for the need of the moment. The goal is to achieve malicious security against the collusion of an arbitrary number of users with some of the servers. Many recent works [1]–[9] exploit Secure Multiparty Computation (MPC) techniques to realize PPML in the SOC setting where 1 Fairness ensures either all or none receive the output, whereas GOD ensures guaranteed output delivery no matter what. the servers enact the role of the parties. Informally, MPC enables n mutually distrusting parties to compute a function over their private inputs, while ensuring the privacy of the same against an adversary controlling up to t parties. Both the training and prediction phases of PPML can be realized in the SOC setting. The common approach of outsourcing followed in the PPML literature, as well as by our work, requires the users to secret-share 2 their inputs between the set of hired (untrusted) servers, who jointly interact and compute the secret-shared output, and reconstruct it towards the users. In a bid to improve practical efficiency, many recent works [6]–[17] cast their protocols into an input-independent preprocessing phase, and input-dependent online phase. Using this paradigm, the input-independent (yet function-dependent) computationally heavy tasks can be computed in the prepro- cessing phase in advance, resulting in a fast online phase. This paradigm suits scenario analogous to PPML setting, where functions (ML algorithms) typically need to be evaluated a large number of times, and the function description is known beforehand. To further enhance practical efficiency by leverag- ing CPU optimizations, recent works [15], [18]–[21] propose MPC protocols that work over 32 or 64 bit rings. Lastly, solutions for a small number of parties have received a huge momentum due to the many cost-effective customizations that they permit, for instance, a cheaper realisation of multiplication through custom-made secret sharing schemes [6]–[9], [22], [23] (that do not scale for a large number of parties). We now motivate the need for robustness aka guaranteed output delivery (GOD) over fairness, or even abort security, in the domain of PPML. Robustness provides the guarantee of output delivery to all protocol participants, no matter how the adversary misbehaves. Robustness is extremely crucial for real- world deployment and usage of PPML techniques. Consider the following scenario wherein an ML model owner wishes to provide inference service. The model owner shares the model parameters between the servers, while the end-users share their queries. A protocol that provides security with abort or fairness will not suffice as in both the cases a malicious adversary can act in a way so that the protocol results in an abort which means that the user will not get the desired output. This leads to denial of service and heavy economic losses for the service provider. For data providers who want to collaboratively build a model on their data, more training data leads to a better, more accurate model, which enables them to provide better ML services and, consequently, attract more clients. A robust framework encourages active involvement from multiple data 2 The threshold of the secret-sharing is decided based on the number of corrupt servers so that privacy is preserved.

Upload: others

Post on 02-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SWIFT: Super-fast and Robust Privacy-Preserving Machine ...Nishat Koti , Mahak Pancholi , Arpita Patra , Ajith Suresh Department of Computer Science and Automation, Indian Institute

SWIFT: Super-fast and Robust Privacy-PreservingMachine Learning

Nishat Koti∗, Mahak Pancholi∗, Arpita Patra∗, Ajith Suresh∗∗Department of Computer Science and Automation, Indian Institute of Science, Bangalore India

{kotis, mahakp, arpita, ajith}@iisc.ac.in

Abstract—Performing ML computation on private data whilemaintaining data privacy aka Privacy-preserving Machine Learn-ing (PPML) is an emergent field of research. Recently, PPML hasseen a visible shift towards the adoption of Secure OutsourcedComputation (SOC) paradigm, due to the heavy computationthat it entails. In the SOC paradigm, computation is outsourcedto a set of powerful and specially equipped servers that provideservice on a pay-per-use basis. In this work, we propose SWIFT,a robust PPML framework for a range of ML algorithms in SOCsetting, that guarantees output delivery to the users irrespective ofany adversarial behaviour. Robustness, a highly desirable feature,evokes user participation without the fear of denial of service.

At the heart of our framework lies a highly-efficient,maliciously-secure, three-party computation (3PC) over ringsthat provides guaranteed output delivery (GOD) in the honest-majority setting. To the best of our knowledge, SWIFT is the firstrobust and efficient PPML framework in the 3PC setting. SWIFTis as fast as the best-known 3PC framework BLAZE (Patra etal. NDSS’20) which only achieves fairness1. We extend our 3PCframework for four parties (4PC). In this regime, SWIFT is asfast as the best known fair 4PC framework Trident (Chaudhariet al. NDSS’20) and twice faster than the best-known robust 4PCframework FLASH (Byali et al. PETS’20).

We demonstrate the practical relevance of our framework bybenchmarking two important applications– i) ML algorithms:Logistic Regression and Neural Network, and ii) Biometricmatching, both over a 64-bit ring in WAN setting. Our readingsreflect our claims as above.

I. INTRODUCTION

Privacy Preserving Machine Learning (PPML), a boomingfield of research, allows Machine Learning (ML) computa-tions over private data of users while ensuring the privacyof the data. PPML finds applications in sectors that dealwith sensitive/confidential data, e.g. healthcare, finance, and incases where organisations are prohibited from sharing clientinformation due to privacy laws such as CCPA and GDPR.However, PPML solutions make the already computationallyheavy ML algorithms more compute-intensive. An averageend-user who lacks the infrastructure required to run thesetasks prefers to outsource the computation to a powerful setof specialized cloud servers and leverage their services on apay-per-use basis. The Secure Outsourced Computation (SOC)paradigm is thus an apt fit for the need of the moment. Thegoal is to achieve malicious security against the collusion ofan arbitrary number of users with some of the servers. Manyrecent works [1]–[9] exploit Secure Multiparty Computation(MPC) techniques to realize PPML in the SOC setting where

1Fairness ensures either all or none receive the output, whereas GODensures guaranteed output delivery no matter what.

the servers enact the role of the parties. Informally, MPCenables n mutually distrusting parties to compute a functionover their private inputs, while ensuring the privacy of thesame against an adversary controlling up to t parties. Both thetraining and prediction phases of PPML can be realized in theSOC setting. The common approach of outsourcing followed inthe PPML literature, as well as by our work, requires the usersto secret-share2 their inputs between the set of hired (untrusted)servers, who jointly interact and compute the secret-sharedoutput, and reconstruct it towards the users.

In a bid to improve practical efficiency, many recentworks [6]–[17] cast their protocols into an input-independentpreprocessing phase, and input-dependent online phase. Usingthis paradigm, the input-independent (yet function-dependent)computationally heavy tasks can be computed in the prepro-cessing phase in advance, resulting in a fast online phase. Thisparadigm suits scenario analogous to PPML setting, wherefunctions (ML algorithms) typically need to be evaluated alarge number of times, and the function description is knownbeforehand. To further enhance practical efficiency by leverag-ing CPU optimizations, recent works [15], [18]–[21] proposeMPC protocols that work over 32 or 64 bit rings. Lastly,solutions for a small number of parties have received a hugemomentum due to the many cost-effective customizations thatthey permit, for instance, a cheaper realisation of multiplicationthrough custom-made secret sharing schemes [6]–[9], [22],[23] (that do not scale for a large number of parties).

We now motivate the need for robustness aka guaranteedoutput delivery (GOD) over fairness, or even abort security, inthe domain of PPML. Robustness provides the guarantee ofoutput delivery to all protocol participants, no matter how theadversary misbehaves. Robustness is extremely crucial for real-world deployment and usage of PPML techniques. Considerthe following scenario wherein an ML model owner wishes toprovide inference service. The model owner shares the modelparameters between the servers, while the end-users share theirqueries. A protocol that provides security with abort or fairnesswill not suffice as in both the cases a malicious adversary canact in a way so that the protocol results in an abort whichmeans that the user will not get the desired output. This leadsto denial of service and heavy economic losses for the serviceprovider. For data providers who want to collaboratively builda model on their data, more training data leads to a better,more accurate model, which enables them to provide betterML services and, consequently, attract more clients. A robustframework encourages active involvement from multiple data

2The threshold of the secret-sharing is decided based on the number ofcorrupt servers so that privacy is preserved.

Page 2: SWIFT: Super-fast and Robust Privacy-Preserving Machine ...Nishat Koti , Mahak Pancholi , Arpita Patra , Ajith Suresh Department of Computer Science and Automation, Indian Institute

providers. Hence, for the seamless adoption of PPML solutionsin the real-world, the robustness of the protocol is of utmostimportance. The hall-mark result of [24] suggests that anhonest-majority amongst the servers is necessary to achieverobustness (in fact, to achieve fairness which is a weaker goalthan robustness).

Consequent to the discussion above, we focus on thehonest-majority setting with a small set of parties, especially3 and 4 parties with one corruption, both of which havedrawn enormous attention recently [6]–[9], [22], [23], [25]–[30]. Our protocols work over rings, cast in input-independentand dependent phases, and achieves GOD.

Before we move on to discuss the related work, we statethe challenges for stretching MPC techniques for PPML.

a) Challenges in PPML: MPC protocols, while a poten-tial solution for SOC, cannot be directly plugged in to achievethe goal of PPML. This is because, ML algorithms involvedecimal values that need to be embedded into rings, resultingin doubling of the fractional part after every multiplication.This was handled in previous works via secure truncation [1],[4], [6], [8], [9]. Secondly, ML algorithms require techniquesto efficiently alternate between boolean and arithmetic compu-tations, as some operations like secure comparison are betterrealized in boolean representation, while operations like dotproduct are better realized in arithmetic representation. Thesechallenges call for customized MPC protocols for truncation,comparison, dot product, and many others.

b) Related Work: We restrict the relevant works witha small number of parties and honest-majority, focusing firston MPC, followed by PPML. MPC protocols for a smallpopulation can be cast into orthogonal domains of low la-tency protocols [27], [31], [32], and high throughput protocols[6], [9], [18], [22], [23], [26], [28], [30], [33]–[35]. In the3PC setting, [6], [22] provide efficient semi-honest protocolswherein ASTRA [6] improved upon [22] by casting theprotocols in the preprocessing model and provided a fast onlinephase. ASTRA further provided security with fairness in themalicious setting with an improved online phase comparedto [23]. Later, a maliciously-secure 3PC protocol based ondistributed zero-knowledge techniques was proposed by Bonehet al. [29] providing abort security. Further, building on [29]and enhancing the security to GOD, Boyle et al. [30] proposeda concretely efficient 3PC protocol with an amortized commu-nication cost of 3 field elements (can be extended to workover rings) per multiplication gate. Concurrently, BLAZE [9]provided a fair protocol in the preprocessing model, whichrequired communicating 3 ring elements in each phase. How-ever, BLAZE eliminated the reliance on the computationallyintensive distributed zero-knowledge system (whose efficiencykicks in for large circuit or many multiplication gates) fromthe online phase and pushed it to the preprocessing phase. Thisresulted in a faster online phase compared to [30].

In the regime of 4PC, Gordon et al. [36] presented proto-cols achieving abort security and GOD. However, [36] reliedon expensive public-key primitives and broadcast channels toachieve GOD. Trident [8] improved over the abort protocolof [36], providing a fast online phase achieving securitywith fairness, and presented a framework for mixed worldcomputations [20]. A robust 4PC protocol was provided in

FLASH [7] which requires communicating 6 ring elements,each, in the preprocessing and online phases.

In the PPML domain, MPC has been used for variousML algorithms such as Decision Trees [37], Linear Regression[38], [39], k-means clustering [40], [41], SVM Classification[42], [43], Logistic Regression [44]. In the 3PC SOC setting,the works of ABY3 [4] and SecureNN [5], provide securitywith abort. This was followed by ASTRA [6], which improvesupon ABY3 and achieves security with fairness. ASTRApresents primitives to build protocols for Linear Regressionand Logistic Regression inference. Recently, BLAZE improvesover the efficiency of ASTRA and additionally tackles trainingfor the above ML tasks, which requires building additionalPPML building blocks, such as truncation and bit to arithmeticconversions. In the 4PC setting, the first robust framework forPPML was provided by FLASH [7] which proposed efficientbuilding blocks for ML such as dot product, truncation, MSBextraction, and bit conversion. The works of [1], [4]–[9] workover rings to garner practical efficiency. In terms of efficiency,BLAZE and respectively FLASH and Trident are the closestcompetitors of this work in 3 and 4 party settings. We nowpresent our contributions and compare them with these works.

A. Our Contributions

We propose a robust maliciously-secure framework forPPML in the SOC setting, SWIFT, with a set of 3 and 4servers having an honest-majority. At the heart of our frame-work lies highly-efficient, maliciously-secure, 3PC and 4PCover rings (both Z2` and Z21 ) that provide GOD in the honest-majority setting. We cast our protocols in the preprocessingmodel which helps to push the computationally intensive tasksinto the preprocessing phase, resulting in a fast online phase.Apart from PPML, our framework also supports biometricmatching.

To the best of our knowledge, SWIFT is the first robust andefficient PPML framework in the 3PC setting and is as fast asthe best known fair 3PC framework BLAZE [9]. We extendour 3PC framework for 4 parties. In this regime, SWIFT isas fast as the best known fair 4PC framework Trident [8] andtwice faster than best known robust 4PC framework FLASH[7]. We next detail our framework, followed by an overviewof technical novelties:

a) Robust 3/4PC frameworks: The framework consistsof a range of primitives realized in a privacy-preservingway which is ensured via running computation in a secret-shared fashion. Secret-sharing tolerating up to one maliciouscorruption is the basis for all our constructions. We use thesharing over both Z2` and its special instantiation Z21 andrefer them as arithmetic and respectively boolean sharing. Ourframework consists of realizations for all primitives needed forgeneral MPC and PPML such as multiplication, dot-product,truncation, bit extraction (given arithmetic sharing of a value v,this is used to generate boolean sharing of the most significantbit (msb) of the value), bit to arithmetic sharing conversion(converts the boolean sharing of a single bit value to its arith-metic sharing), bit injection (computes the arithmetic sharingof b ·v, given the boolean sharing of a bit b and the arithmeticsharing of a ring element v) and above all input sharingand output reconstruction. The performance comparison in

2

Page 3: SWIFT: Super-fast and Robust Privacy-Preserving Machine ...Nishat Koti , Mahak Pancholi , Arpita Patra , Ajith Suresh Department of Computer Science and Automation, Indian Institute

BuildingBlocks

3PC 4PC

Ref.Pre. Online

Security Ref.Pre. Online

SecurityComm. (`) Rounds Comm. (`) Comm. (`) Rounds Comm. (`)

Multiplication

[29] 1 1 2 Abort[30] - 3 3 GOD Trident 3 1 3 Fair

BLAZE 3 1 3 Fair FLASH 6 1 6 GODSWIFT 3 1 3 GOD SWIFT 3 1 3 GOD

Dot ProductTrident 3 1 3 Fair

BLAZE 3n 1 3 Fair FLASH 6 1 6 GODSWIFT 3n 1 3 GOD SWIFT 3 1 3 GOD

Dot Productwith Truncation

Trident 6 1 3 FairBLAZE 3n + 2 1 3 Fair FLASH 8 1 6 GODSWIFT 3n + 2 1 3 GOD SWIFT 4 1 3 GOD

BitExtraction

Trident ≈ 8 log `+ 1 ≈ 7 FairBLAZE 9 1 + log ` 9 Fair FLASH 14 log ` 14 GODSWIFT 9 1 + log ` 9 GOD SWIFT ≈ 7 log ` ≈ 7 GOD

Bit toArithmetic

Trident ≈ 3 1 3 FairBLAZE 9 1 4 Fair FLASH 6 1 8 GODSWIFT 9 1 4 GOD SWIFT ≈ 3 1 3 GOD

BitInjection

Trident ≈ 6 1 3 FairBLAZE 12 2 7 Fair FLASH 8 2 10 GODSWIFT 12 2 7 GOD SWIFT ≈ 6 1 3 GOD

– Notations: ` - size of ring in bits, n - size of vectors for dot product, κ - computational security parameter.

TABLE I: 3PC and 4PC: Comparison of SWIFT with its closest competitors in terms of Communication and Round Complexity

terms of concrete cost for communication and round, for thepreprocessing and online phase of PPML primitives, for both3PC and 4PC, appear in Table I. Akin to our earlier claim,SWIFT is neck to neck with BLAZE for each primitive (whileimproving security from fairness to robustness). The same istrue for SWIFT and Trident in the 4-party case. On the otherhand, SWIFT is doubly faster than FLASH.

We now point out a few lucrative features that our frame-work offers, some of which are shared with the earlier workson PPML. First, by resorting to the preprocessing model, weachieve dot product with online cost completely independent ofvector size. This brings a massive gain since dot product is oneof the most invoked primitives in PPML. The other primitivesalso show improved performance in the online phase in thepreprocessing paradigm. The multiplication protocol gives atechnical basis for our dot product protocol. Instead of usingthe multiplication of [30] (which has the same overall com-munication cost as that of our online phase), we build a newprotocol that gives rise to a dot product with the above feature.Also, the multiplication protocol of [30] involves distributedzero-knowledge protocols. The cost of this heavy machinerygets amortized over, only for large circuits having millions ofgates, which is very unlikely for inferences, and moderatelyheavy training in the PPML domain. Second, extending tothe 4-party setting brings to the table a flurry of performanceimprovements over 3PC. Most prominent of all is a dot productwith cost totally independent of vector size, which remains asa challenging open question in 3PC setting. Third, the onlytwo tasks, input sharing and output reconstruction, carriedout by the users of our framework are very light-weight, inspite of offering GOD. As a final remark, we note that theroles of the servers in our framework are asymmetric andconsequently, we only require active participation from twoof the servers, while the remaining server(s) is(are) brought in

just for the verification towards the end of each phase. In acloud setting, this may provide additional benefit in terms ofmonetary cost [45].

We demonstrate the practicality of our protocols by bench-marking Biometric Matching and PPML. For the latter, Logis-tic Regression (training and inference) and Neural Networks(inference) are considered. The NN training requires mixed-world conversions [4], [8], [20], which we leave as futurework. Our PPML blocks can be used to perform training andinference of Linear Regression, Support Vector Machines, andBinarized Neural Networks (as demonstrated in [6]–[9]).

b) Overview of Techniques: Robustness is known tobe desirable, yet a costly goal. We work against the pre-assumption on cost, at least for our concerned setting. Startingfrom BLAZE [9], we overcome a series of hurdles to achieveGOD.

Relay primitive with rate-1 communication. We introduce anew primitive called Joint Message Passing (jmp) that allowstwo servers to relay a common message to the third serversuch that either the relay is successful or an honest server (ora conflicting pair) is identified. The striking feature of jmpis that it offers a rate-1 communication i.e. for a messageof ` elements, it only incurs a communication of ` elements(in an amortized sense). Without any extra cost, it allows usto replace several pivotal private communications, that maylead to abort, either because the malicious sender does notsend anything or sends a wrong message. Being two-senderprimitive, it has one guaranteed honest sender who (in somesense) guards the correct send. Using this primitive, we createa win-win scenario as below. Either the send is successful (andconsequently the computation) or an honest server is identified.All our primitives invoke jmp and as a result, the final protocol,either for a general 3PC or a PPML task, requires invocations

3

Page 4: SWIFT: Super-fast and Robust Privacy-Preserving Machine ...Nishat Koti , Mahak Pancholi , Arpita Patra , Ajith Suresh Department of Computer Science and Automation, Indian Institute

of many jmp primitives. To leverage amortization, the primitiveis designed to have two phases, send and verify, where the sendphase is carried out on the flow of the protocol and the verifyphase takes place once and for all in the end. If the verificationgoes through (meaning all sends via jmp are successful) thecomputation carried out already achieves GOD. In the lattercase, our computation completes by labelling the honest serveras a trusted third party and allowing it to simply carry out thecomputation centrally. To shoot for GOD, our next step is toconform to all the protocols to be able to use jmp primitive.

Conforming protocols. Among all our protocols, the majorchallenge came from the multiplication protocol of BLAZE(which is our starting point). Our approach is to manipulate andtransform some of the protocol steps so that the informationrequired by a server in a round can be locally computed by twoother servers. But this transformation is not straight forwardsince BLAZE was constructed with a focus towards providingonly fairness. We demonstrate with an example below. Weconsider the online phase of multiplication in BLAZE at avery high level. Let P0, P1, P2 be the three servers. P1, P2

need to locally compute additive shares of value v, denotedby v1, v2 and mutually exchange the shares to obtain v inclear. To prevent P1, P2 from cheating, P0 is equipped withpreprocessing data that allows it to compute v? = v+δ locally.Here δ is a common value possessed by both P1, P2 and isunknown to P0. P0 then sends v? in clear to both P1, P2 whoabort in the case of any inconsistency. It is only P0 who hasenough information to compute v?, and such an arrangementmakes it difficult to achieve GOD in BLAZE. Hence, wemodify the online phase in such a way that the pairs (P0, P1)and (P0, P2) can locally compute one of the two additiveshares of v?. Servers then communicate the missing share usingjmp primitive. Upon obtaining v?, P1, P2 locally compute vusing δ which is known to them. This transformation in theonline phase of the multiplication protocol demands a freshpreprocessing phase from scratch. Thus, it is the combinationof jmp primitive, a new preprocessing phase, along with therestructuring of the online phase that helps us to obtain highersecurity guarantee of GOD without affecting the communica-tion cost.

Improved jmp primitive for 4PC. In a 4-party setting, weprovide an improved instantiation of the jmp primitive, whichforgoes the broadcast channel, while retaining the rate-1property. Whereas, our 3-party instantiation uses a broadcast.We note that, in the 4PC case, our jmp primitive achievesa goal similar to the “bi-convey primitive” of FLASH [7].However, jmp is more efficient. Further, it allows identifyingan honest server, as opposed to two honest servers locallyidentifying each other in bi-convey, helping to craft a cleanand swift completion after this event. We defer other details to§IV. Using jmp, we present better/simpler protocols than 3PCcounterparts, specifically the multiplication and dot product.

Robust and Improved Input Sharing and Output Reconstruc-tion. We provide robust protocols for the input sharing andoutput reconstruction phase in the SOC setting, wherein a usershares its input with the servers and the output is reconstructedtowards a user. The need for robustness and light-weightlesstogether makes these slightly non-trivial. As a highlight, weintroduce a super-fast online phase for the reconstructionprotocol, which gives 4× improvement in terms of rounds

(apart from improvement in communication as well) comparedto BLAZE. We make sure that a user neither takes part in ajmp protocol nor in a broadcast, both of which need severalrounds of communication and are relatively expensive thanatomic point-to-point communication.

B. Organisation of the paper

The rest of the paper is organized as follows. In §II wedescribe the system model, preliminaries and notations used.§III and §IV detail our constructs in the 3PC and respectively4PC setting. These are followed by the applications andbenchmarking in §V. The appendix §A elaborates on thepreliminaries. Protocols to complete the PPML framework anddetailed cost analysis for all the 3PC and 4PC protocols areprovided in appendix §B and §C respectively. The securityproofs for our constructions follow in appendix §D.

II. PRELIMINARIES

We consider a set of three servers P = {P0, P1, P2} thatare connected by pair-wise private and authentic channels ina synchronous network, and a static, malicious adversary thatcan corrupt at most one server. We use a broadcast channel for3PC alone, which is inevitable [46]. For ML training, severaldata-owners who wish to jointly train a model, secret share(using the sharing semantics which appear in the latter part ofthe paper) their data among the servers. For ML inference, amodel-owner and client secret share the model and the query,respectively, among the servers. Once the inputs are availablein the shared format, the servers perform computations andobtain the output in the shared form. In the case of training,the output model is reconstructed towards the data-owners,whereas for inference, the prediction result is reconstructedtowards the client. We assume that an arbitrary number of data-owners may collude with a corrupt server for training, whereasfor the case of prediction, we assume that either the model-owner or the client can collude with a corrupt server. We provethe security of our protocols using standard real-world / ideal-world paradigm. We also explore the above model for the fourserver setting with P = {P0, P1, P2, P3}. The aforementionedsetting has been explored extensively [1], [4], [6]–[9].

Our constructions achieve the strongest security guaranteeof GOD. A protocol is said to be robust or achieve GODif all parties obtain the output of the protocol regardless ofhow the adversary behaves. In our model, this translates to allthe data owners obtaining the trained model for the case ofML training, while the client obtains the query output for MLinference. All our protocols are cast into: input-independentpreprocessing phase and input-dependent online phase.

For 3/4PC, the function to be computed is expressed as acircuit ckt, whose topology is public, and is evaluated overan arithmetic ring Z2` or boolean ring Z21 . For PPML, weconsider computation over the same algebraic structure. Todeal with floating-point values, we use Fixed-Point Arithmetic(FPA) [1], [4], [6]–[9] representation in which a decimal valueis represented as an `-bit integer in signed 2’s complementrepresentation. The most significant bit (MSB) represents thesign bit and x least significant bits are reserved for thefractional part. The `-bit integer is then treated as an elementof Z2` and operations are performed modulo 2`. We set ` = 64and x = 13, leaving `− x− 1 bits for the integral part.

4

Page 5: SWIFT: Super-fast and Robust Privacy-Preserving Machine ...Nishat Koti , Mahak Pancholi , Arpita Patra , Ajith Suresh Department of Computer Science and Automation, Indian Institute

The servers use a one-time key setup, modelled as afunctionality Fsetup (Fig. 14), to establish pre-shared randomkeys for pseudo-random functions (PRF) between them. Asimilar setup is used in [3], [4], [6], [9], [23], [26], [30] for3 server case, and in [7], [8] for 4 server setting. The key-setup can be instantiated using any standard MPC protocolin the respective setting. Further, our protocols make use ofa collision-resistant hash function, denoted by H(), and acommitment scheme, denoted by Com(). The formal detailsof key setup, hash function, and the commitment scheme aredeferred to §A.

Notation II.1. The ith element of a vector ~x is denoted as xi.The dot product of two n length vectors, ~x and ~y, is computedas ~x� ~y =

∑ni=1 xiyi. For two matrices X,Y, the operation

X◦Y denotes the matrix multiplication. The ith bit of an `-bitvalue v is denoted by v[i].

Notation II.2. For a bit b ∈ {0, 1}, we use bR to denote theequivalent value of b over the ring Z2` . bR will have its leastsignificant bit set to b, while all other bits will be set to zero.

III. ROBUST 3PC AND PPML

In this section, we first introduce the sharing semanticsfor three servers. Then, we introduce our new Joint MessagePassing (jmp) primitive, which plays a crucial role in obtainingthe strongest security guarantee of GOD, followed by ourprotocols in the three server setting.

a) Secret Sharing Semantics: We use the followingsecret-sharing semantics.◦ [·]-sharing: A value v ∈ Z2` is [·]-shared among P1, P2, ifPs for s ∈ {1, 2} holds [v]s ∈ Z2` such that v = [v]1 + [v]2.◦ 〈·〉-sharing: A value v ∈ Z2` is 〈·〉-shared among P , if– there exists v0, v1, v2 ∈ Z2` such that v = v0 + v1 + v2.– Ps holds (vs, v(s+1)%3) for s ∈ {0, 1, 2}.◦ J·K-sharing: A value v ∈ Z2` is J·K-shared among P , if– there exists αv ∈ Z2` that is [·]-shared among P1, P2.– there exists βv, γv ∈ Z2` such that βv = v + αv and P0

holds ([αv]1 , [αv]2 , βv+γv) while Ps for s ∈ {1, 2} holds([αv]s , βv, γv).

b) Arithmetic and Boolean Sharing: Arithmetic sharingrefers to sharing over Z2` while boolean sharing, denoted asJ·KB, refers to sharing over Z21 .

c) Linearity of the Secret Sharing Scheme: Given the[·]-shares of v1, v2, and public constants c1, c2, servers canlocally compute the [·]-share of c1v1 +c2v2 as c1 [v1]+c2 [v2].It is trivial to see that the linearity property is satisfied by 〈·〉and J·K-sharing as well.

A. Joint Message Passing primitive

The jmp primitive allows two servers to relay a commonmessage to the third server such that either the relay is success-ful or an honest server (or a conflicting pair) is identified. Thestriking feature of jmp is that it offers a rate-1 communicationi.e. for a message of ` elements, it only incurs a communicationof ` elements (in an amortized sense).

The task of jmp is captured in an ideal function-ality (Fig. 1) below and the protocol (Fig. 2) realiz-ing the functionality appears subsequently followed by anoverview.

Fjmp interacts with the servers in P and the adversary S.Step 1: Fjmp receives (Input, vs) from Ps for s ∈ {i, j}, while

it receives (Select, ttp) from S. Here ttp denotes the server thatS wants to choose as the TTP. Let P ? ∈ P denote the servercorrupted by S.

Step 2: If vi = vj and ttp = ⊥, then set msgi = msgj =⊥,msgk = vi and go to Step 5.

Step 3: If ttp ∈ P\{P ?}, then set msgi = msgj = msgk = ttp.Step 4: Else, TTP is set to be the honest party with smallest

index. Set msgi = msgj = msgk = TTP

Step 5: Send (Output,msgs) to Ps for s ∈ {0, 1, 2}.

Functionality Fjmp

Fig. 1: 3PC: Ideal functionality for jmp primitive

– Each server Ps for s ∈ {i, j, k} initializes bit bs = 0.– Pi sends v to Pk, while Pj sends H(v) to Pk.– Pk broadcasts "(accuse,Pi)", if Pi is silent and TTP = Pj .

Analogously for Pj . If Pk accuses both Pi, Pj , then TTP = Pi.Otherwise, Pk receives some v and either sets bk = 0 when thevalue and the hash are consistent or sets bk = 1. Pk then sendsbk to Pi, Pj and terminates if bk = 0.

– If Pi does not receive a bit from Pk, it broadcasts"(accuse,Pk)" and TTP = Pj . Analogously for Pj . If bothPi, Pj accuse Pk, then TTP = Pi. Otherwise, Ps for s ∈ {i, j}sets bs = bk.

– Pi, Pj exchange their bits to each other. If Pi does not receivebj from Pj , it broadcasts "(accuse,Pj)" and TTP = Pk.Analogously for Pj . Otherwise, Pi resets its bit to bi ∨ bj andlikewise Pj resets its bit to bj ∨ bi.

– Ps for s ∈ {i, j, k} broadcasts Hs = H(v∗) if bs = 1, wherev∗ = v for s ∈ {i, j} and v∗ = v otherwise. If Pk does notbroadcast, terminate. If either Pi or Pj does not broadcast, thenTTP = Pk. Otherwise,• If Hi 6= Hj : TTP = Pk.• Else if Hi 6= Hk: TTP = Pj .• Else if Hi = Hj = Hk: TTP = Pi.

Protocol Πjmp(Pi, Pj , Pk, v)

Fig. 2: 3PC: Joint Message Passing Protocol

Given two servers Pi, Pj possessing a common valuev ∈ Z2` , protocol Πjmp proceeds as follows. First, Pi sendsv to Pk while Pj sends a hash of v to Pk. The communicationof v is done once and for all from Pi to Pk. In the simplestcase, Pk receives consistent (value, hash) pair, and the protocolterminates. In all other cases, a TTP is identified as followswithout having to communicate v again. Importantly, thispart can be run once and for all instances of Πjmp withPi, Pj , Pk in the same roles, invoked in the final 3PC protocol.Consequently, the cost relevant to this part vanishes in anamortized sense, making the construction rate-1.

Each Ps for s ∈ {i, j, k} maintains a bit bs initializedto 0, as an indicator for inconsistency. When Pk receivesinconsistent (value, hash) pair, it sets bk = 1 and sends the bitto both Pi, Pj , who cross-check with each other by exchangingthe bit and turn on their inconsistency bit if the bit received

5

Page 6: SWIFT: Super-fast and Robust Privacy-Preserving Machine ...Nishat Koti , Mahak Pancholi , Arpita Patra , Ajith Suresh Department of Computer Science and Automation, Indian Institute

from either Pk or its fellow sender is turned on. A partybroadcasts a hash of its value when its inconsistency bit ison;3 Pk’s value is the one it receives from Pi. At this stage,there are a bunch of possible cases and a detailed analysisdetermines an eligible TTP in each case.

When Pk is silent, the protocol is understood to be com-plete. This is fine irrespective of the status of Pk– an honestPk never skips this broadcast with inconsistency bit on and acorrupt Pk implies honest senders. If either Pi or Pj is silent,then Pk is picked as TTP which is surely honest. A corrupt Pkcould not make one of {Pi, Pj} speak as the senders (honest inthis case) are on agreement on their inconsistency bit (due totheir mutual exchange of inconsistency bit). When all of themspeak and (i) the senders’ hashes do not match, Pk is picked asTTP; (ii) one of the senders conflicts with Pk, the other senderis picked as TTP; and lastly (iii) if there is no conflict, Pi ispicked as TTP. The first two cases are self-explanatory. In thelast case, either Pj or Pk is corrupt. Because a corrupt Pi canhave honest Pk speak (and hence turn on its inconsistency bit),by sending a v′ whose hash is not same as that of v and soinevitably the hashes of honest Pj and Pk will conflict.

As a final touch, we ensure that, in each step, a party raisesa public alarm (via broadcast) accusing a party who is silentwhen it is not supposed to be, and the protocol terminatesimmediately by labelling the party as TTP who is neither thecomplainer nor the accused.

Notation III.1. We say that Pi, Pj jmp-send v to Pk whenthey invoke Πjmp(Pi, Pj , Pk, v).

Using jmp in protocols. As mentioned in the introduction,the protocol for jmp needs to be viewed consisting of twophases (send, verify), where send phase consists of Pi sendingv to Pk and the rest goes to verify phase. Looking ahead,most of our protocols use jmp, and consequently our finalconstruction, either of general MPC or any PPML task willhave several calls to jmp. To leverage amortization, the sendphase will be executed in all protocols invoking jmp on theflow, while the verify for a fixed ordered pair of senders willbe executed once and for all in the end. The verify phasewill determine if all the sends were correct. If not, a TTP isidentified, as explained, and the computation completes withthe help of TTP, just as in the ideal-world.

B. 3PC Protocols

We now describe the protocols for 3 parties/servers.

a) Sharing Protocol: Protocol Πsh (Fig. 3) allows aserver Pi to generate J·K-shares of a value v ∈ Z2` . In thepreprocessing phase, P0, Pj for j ∈ {1, 2} along with Pisample a random [αv]j ∈ Z2` , while P1, P2, Pi sample randomγv ∈ Z2` . This allows Pi to know both αv and γv in clear.During the online phase, if Pi = P0, then P0 sends βv = v+αv

to P1. P0, P1 then jmp-send βv to P2 to complete the secretsharing. If Pi = P1, P1 sends βv = v +αv to P2. Then P1, P2

jmp-send βv+γv to P0. The case for Pi = P2 proceeds similarto that of P1. The correctness of the shares held by each serveris assured by the guarantees of Πjmp.

3This hash can be computed on a combined message across many calls ofjmp.

Preprocessing:

– If Pi = P0 : P0, Pj , for j ∈ {1, 2}, together sample random[αv]j ∈ Z2` , while P together sample random γv ∈ Z2` .

– If Pi = P1 : P0, P1 together sample random [αv]1 ∈ Z2` , whileP together sample a random [αv]2 ∈ Z2` . Also, P1, P2 togethersample random γv ∈ Z2` .

– If Pi = P2: Symmetric to the case when Pi = P1.

Online:

– If Pi = P0 : P0 computes βv = v + αv and sends βv to P1.P1, P0 jmp-send βv to P2.

– If Pi = Pj , for j ∈ {1, 2} : Pj computes βv = v + αv, sendsβv to P3−j . P1, P2 jmp-send βv + γv to P0.

Protocol Πsh(Pi, v)

Fig. 3: 3PC: Generating JvK-shares by server Pi

b) Joint Sharing Protocol: Protocol Πjsh (Fig. 16) al-lows two servers Pi, Pj to jointly generate a J·K-sharing of avalue v ∈ Z2` that is known to both. Towards this, serversexecute the preprocessing of Πsh (Fig. 3) to generate [αv] andγv. If (Pi, Pj) = (P1, P0), then P1, P0 jmp-send βv = v+αv toP2. The case when (Pi, Pj) = (P2, P0) proceeds similarly. Thecase for (Pi, Pj) = (P1, P2) is optimized further as follows:servers locally set [αv]1 = [αv]2 = 0. P1, P2 together samplerandom γv ∈ Z2` , set βv = v and jmp-send βv + γv to P0. Wedefer the formal details of Πjsh to §B-C.

c) Addition Protocol: Given the J·K-shares on inputwires x, y, servers can use the linearity property of the sharingscheme to locally compute J·K-shares of the output of additiongate, z = x + y as JzK = JxK + JyK.

d) Multiplication Protocol: Protocol Πmult(P, JxK, JyK)(Fig. 4) enables the servers in P to compute J·K-sharing ofz = xy, given the J·K-sharing of x and y. We build onthe protocol of BLAZE [9] and discuss along the way thedifferences and resemblances. We begin with a protocol for thesemi-honest setting, which is also the starting point of BLAZE.During the preprocessing phase, P0, Pj for j ∈ {1, 2} samplerandom [αz]j ∈ Z2` , while P1, P2 sample random γz ∈ Z2` .In addition, P0 locally computes Γxy = αxαy and generates[·]-sharing of the same between P1, P2. Since

βz = z + αz = xy + αz = (βx − αx)(βy − αy) + αz

= βxβy − βxαy − βyαx + Γxy + αz (1)

holds, servers P1, P2 locally compute [βz]j = (j−1)βxβy−βx [αy]j − βy [αx]j + [Γxy]j + [αz]j during the online phaseand mutually exchange their shares to reconstruct βz. P1 thensends βz +γz to P0, completing the semi-honest protocol. Thecorrectness that asserts z = xy or in other words βz−αz = xyholds due to Eq. 1.

The following issues arise in the above protocol when amalicious adversary is considered:

1) When P0 is corrupt, the [·]-sharing of Γxy performed by P0

might not be correct, i.e. Γxy 6= αxαy.2) When P1 (or P2) is corrupt, the [·]-share of βz handed over

to the fellow honest evaluator during the online phase mightnot be correct, causing reconstruction of an incorrect βz.

3) When P1 is corrupt, the value βz + γz that is sent to P0

during the online phase may not be correct.

6

Page 7: SWIFT: Super-fast and Robust Privacy-Preserving Machine ...Nishat Koti , Mahak Pancholi , Arpita Patra , Ajith Suresh Department of Computer Science and Automation, Indian Institute

All the three issues are common with BLAZE (copiedverbatim), but we differ from BLAZE in handling them. Webegin with solving the last issue first. We simply make P1, P2

jmp-send βz + γz to P0 (after βz is computed). This eitherleads to success or a TTP selection. Due to jmp’s rate-1communication, P1 alone sending the value to P0 remains ascostly as using jmp in amortized sense. Whereas in BLAZE,the malicious version simply makes P2 to send a hash of βz+γzto P0 (in addition to P1’s communication of βz + γz to P0),who aborts if the received values are inconsistent.

For the remaining two issues, similar to BLAZE, we reduceboth to a multiplication (on values unrelated to inputs) inthe preprocessing phase. However, our method leads to eithersuccess or TTP selection, with no additional cost.

We start with the second issue. To solve it, where a corruptP1 (or P2) sends an incorrect [·]-share of βz, BLAZE makesuse of server P0 to compute a version of βz for verification,based on βx and βy, as follows. Using βx + γx and βy + γy,which are already available to P0 as a part of JxK, JyK, P0

computes:

β?z = −(βx + γx)αy − (βy + γy)αx + 2Γxy + αz

= (βz − βxβy)− (γxαy + γyαx − Γxy + ψ) + ψ [by Eq. 1]

= (βz − βxβy + ψ)− χ [where χ = γxαy + γyαx − Γxy + ψ]

Assuming that (a) ψ ∈ Z2` is a random value sampledtogether by P1 and P2 (and unknown to P0) for securing theβ values from P0 and (b) P0 knows the value χ, P0 can sendβ?z + χ to P1 and P2 who using the knowledge of βx, βy andψ can verify the correctness of βz by computing βz−βxβy +ψand checking against the value β?z + χ received from P0. Therest of the logic in BLAZE goes on to discuss how to enforceP0– (a) to compute a correct χ (when honest) and (b) to sharecorrect Γxy (when corrupt) via a single multiplication of twovalues in the preprocessing phase. Tying the ends together,one of their innovations goes in identifying the precise sharedmultiplication triple and mapping its components to χ and Γxy

so that these are correct by the virtue of the correctness of themultiplication relation.

Preprocessing:

– P0, Pj for j ∈ {1, 2} together sample random [αz]j ∈ Z2` ,while P1, P2 sample random γz ∈ Z2` .

– Servers in P locally compute 〈·〉-sharing of d = γx + αx ande = γy + αy by setting the shares as follows (ref. Table II):

(d0= [αx]2 , d1= [αx]1 , d2=γx), (e0= [αy]2 , e1= [αy]1 , e2=γy)

– Servers in P execute ΠmulPre(P, d, e) to generate 〈f〉 = 〈de〉.– P0, P1 locally set [χ]1 = f1, while P0, P2 locally set [χ]2 = f0.P1, P2 locally compute ψ = f2 − γxγy.

Online:

– P0, Pj , for j ∈ {1, 2}, compute [β?z ]j = −(βx + γx) [αy]j −

(βy + γy) [αx]j + [αz]j + [χ]j .– P0, P1 jmp-send [β?

z ]1 to P2 and P0, P2 jmp-send [β?z ]2 to P1.

– P1, P2 compute β?z = [β?

z ]1+[β?z ]2 and set βz = β?

z +βxβy+ψ.– P1, P2 jmp-send βz + γz to P0.

Protocol Πmult(P, JxK, JyK)

Fig. 4: 3PC: Multiplication Protocol (z = x · y)

We differ from BLAZE in several ways. First, we do notsimply rely on P0 for the verification information β?z + χ,as this may inevitably lead to abort when P0 is corrupt.Instead, we find (a slightly different) β?z that, instead of entirelyavailable to P0, will be available in [·]-shared form betweenthe two teams {{P0, Pi}}i∈{1,2}, with both servers in {P0, Pi}holding ith share [β?z ]i. With this edit, the ith team canjmp-send the ith share of β?z to the third party which computesβ?z . Due to the presence of one honest party in each team,this β?z is correct and P1, P2 directly use it to compute βz,with the knowledge of ψ, βx, βy. This means, departing fromBLAZE and the starting semi-honest construction, P1 and P2

compute βz from β?z . Whereas, BLAZE suggests computing βzfrom the exchange P1, P2’s respective share of βz and use β?zfor verification. The outcome is a win-win situation i.e. eithersuccess or TTP selection. Our new β?z and χ are:

χ = γxαy + γyαx + Γxy − ψ andβ?z = −(βx + γx)αy − (βy + γy)αx + αz + χ

= (−βxαy − βyαx + Γxy + αz)− ψ = βz − βxβy − ψ

Clearly, both P0 and Pi can compute [β?z ]i = −(βx +γx) [αy]i− (βy + γy) [αx]i + [αz]i + [χ]i given [χi]. The rest ofour discussion explains how (a) ith share of [χ] can be madeavailable to {P0, Pi} and (b) ψ can be derived by P1, P2, froma multiplication triple. Similar to BLAZE, yet for a differenttriple, we observe that (d, e, f) is a multiplication triple, whered = (γx + αx), e = (γy + αy), f = (γxγy + ψ) + χ if and onlyif χ and Γxy are correct. Indeed,

de = (γx + αx)(γy + αy) = γxγy + γxαy + γyαx + Γxy

= (γxγy + ψ) + (γxαy + γyαx + Γxy − ψ)

= (γxγy + ψ) + χ = f

Based on this observation, we compute the above multipli-cation triple using a multiplication protocol and extract outthe values for ψ and χ from the shares of f which arebound to be correct. This can be executed entirely in thepreprocessing phase. Specifically, the servers (a) locally obtain〈·〉-shares of d, e as in Table II, (b) compute 〈·〉-shares off(= de), say denoted by f0, f1, f2, using an efficient, robust3-party multiplication protocol, say ΠmulPre (abstracted in afunctionality Fig. 17) and finally (c) extract out the requiredpreprocessing data locally as in Eq. 2. We switch to 〈·〉-sharingin this part to be able to use the best robust multiplicationprotocol of [30] that supports this form of secret sharing andrequires communication of just 3 elements. Fortunately, theswitch does not cost anything, as both the first and third stepsinvolve local computation and the cost simply reduces to asingle run of a multiplication protocol.

P0 P1 P2

〈v〉 (v0, v1) (v1, v2) (v2, v0)

〈d〉 ([αx]2 , [αx]1) ([αx]1 , γx) (γx, [αx]2)

〈e〉 ([αy]2 , [αy]1) ([αy]1 , γy) (γy, [αy]2)

TABLE II: The 〈·〉-sharing of values d and e

[χ]2 ← f0, [χ]1 ← f1, γxγy + ψ ← f2. (2)

According to 〈·〉-sharing, both P0 and P1 obtain f1 and henceobtain [χ]1. Similarly, P0, P2 obtain f0 and hence [χ]2. Finally,

7

Page 8: SWIFT: Super-fast and Robust Privacy-Preserving Machine ...Nishat Koti , Mahak Pancholi , Arpita Patra , Ajith Suresh Department of Computer Science and Automation, Indian Institute

P1, P2 obtain f2 from which they compute ψ = f2−γxγy. Thiscompletes the informal discussion.

We note that to facilitate a fast online phase for multiplica-tion, our preprocessing phase leverages a robust multiplicationprotocol [30] in a black-box manner to derive the necessarypreprocessing information. A similar black-box approach isalso taken for the dot product protocol in the preprocessingphase. This leaves room for further improvements in commu-nication cost, which can be obtained by instantiating the black-box with an efficient robust dot product protocol, coupled withthe fast online phase.

e) Reconstruction Protocol: Protocol Πrec (Fig. 5) al-lows servers to robustly reconstruct value v ∈ Z2` fromits J·K-shares. Note that each server misses one share of vwhich is held by the other two servers. Consider the case ofP0 who requires γv to compute v. During the preprocessing,P1, P2 compute a commitment of γv, denoted by Com(γv) andjmp-send the same to P0. Similar steps are performed for thevalues [αv]2 and [αv]1 that are required by servers P1 and P2

respectively. During the online phase, servers open their com-mitments to the intended server who accepts the opening that isconsistent with the agreed upon commitment.

Preprocessing:

– P0, Pj , for j ∈ {1, 2}, compute Com([αv]j), while P1, P2

compute Com(γv).– P1, P2 jmp-send Com(γv) to P0, while P0, P1 jmp-send

Com([αv]1) to P2, and P0, P2 jmp-send Com([αv]2) to P1,

Online:

– P0, P1 open Com([αv]1) to P2. P0, P2 open Com([αv]2) to P1.P1, P2 open Com(γv) to P0.

– Each of the servers accept the opening that is consistent with theagreed upon commitment. P1, P2 compute v = βv−[αv]1−[αv]2,while P0 computes v = (βv + γv)− [αv]1 − [αv]2 − γv.

Protocol Πrec(P, JvK)

Fig. 5: 3PC: Reconstruction of v among the servers

f) The Complete 3PC: For the sake of completenessand to demonstrate how GOD is achieved, we show how tocompile the above primitives for a general 3PC. The main pur-pose is to explain the usage of jmp in a complete computation.A similar approach will be taken for 4PC and each PPML task(both training and inference) and we will avoid repetition. Inorder to compute an arithmetic circuit over Z2` , we first invokethe key-setup functionality Fsetup (Fig. 14) for key distributionand preprocessing of Πsh, Πmult and Πrec, as per the givencircuit. During the online phase, Pi ∈ P shares its input xiby executing online steps of Πsh (Fig. 3) protocol. This isfollowed by the circuit evaluation phase, where severs evaluatethe gates in the circuit in the topological order, with additiongates (and multiplication-by-a-constant gates) being computedlocally, and multiplication gates being computed via onlineof Πmult (Fig. 4). Finally, servers run the online steps of Πrec

protocol (Fig. 5) on the output wires to reconstruct the functionoutput. All the building blocks above invoke jmp, except theonline phase of reconstruction. To leverage amortization, onlysend phases of all the jumps are run on the flow. At the endof preprocessing, and right before the reconstruction in theonline phase, the verify phase for all possible ordered pair

of senders are run. We carry on computation in the onlinephase only when the verify phases in the preprocessing aresuccessful. Otherwise, the servers simply send their inputs tothe elected TTP who computes the function and returns theresult to all the servers. Similarly, depending on the outputof the verify phase at the end of the online phase, either thereconstruction is carried out or a TTP is identified. In the lattercase, computation completes as mentioned before.

C. Building Blocks for PPML using 3PC

This section provides details on robust realizations of thefollowing building blocks for PPML in 3-server setting– i) DotProduct, ii) Truncation, iii) Dot Product with Truncation, iv)Secure Comparison, and v) Non-linear Activation functions–Sigmoid and ReLU. We begin by providing details of inputsharing and reconstruction in the SOC setting.

a) Input Sharing and Output Reconstruction in the SOCSetting: Protocol ΠSOC

sh (Fig. 6) extends input sharing tothe SOC setting and allows a user U to generate the J·K-shares of its input v among the three servers. Note that thenecessary commitments to facilitate the sharing are generatedin the preprocessing phase by the servers which are thencommunicated to U, along with the opening, in the onlinephase. U selects the commitment forming the majority (foreach share) owing to the presence of an honest majority amongthe servers, and accepts the corresponding shares. Analogously,protocol ΠSOC

rec (Fig. 6) allows the servers to reconstruct a valuev towards user U. In either of the protocols, if at any point, aTTP is identified, then servers signal the TTP’s identity to U.U selects the TTP as the one forming a majority and sends itsinput in clear to the TTP, who computes the function outputand sends it back to U.

Input Sharing:

– P0, Ps, for s ∈ {1, 2}, together sample random [αv]s ∈ Z2` ,while P1, P2 together sample random γv ∈ Z2` .

– P0, P1 jmp-send Com([αv]1) to P2, while P0, P2 jmp-sendCom([αv]2) to P1, and P1, P2 jmp-send Com(γv) to P0.

– Each of the servers sends (Com([αv]1),Com([αv]2),Com(γv))to U who accepts the values that form the majority. Also, P0, Ps,for s ∈ {1, 2}, open [αv]s towards U while P1, P2 open γvtowards U.

– U accepts the consistent opening, recovers [αv]1 , [αv]2 , γv,computes βv = v + [αv]1 + [αv]2, and sends βv + γv to all threeservers.

– Servers broadcast the received value and accept the majorityvalue if it exists, and a default value, otherwise. P1, P2 locallycompute βv from βv + γv using γv to complete the sharing of v.

Output Reconstruction:

– Servers execute the preprocessing of Πrec(P, JvK) to agree uponcommitments of [αv]1 , [αv]2 and γv.

– Each of the servers send βv + γv as well as commitments on[αv]1 , [αv]2 and γv to U, who accepts the values forming majority.

– Now, P0, P1 open [αv]1 to U, P0, P2 open [αv]2, while P1, P2

open γv to U.– U accepts the consistent opening and computes v = (βv +γv)−[αv]1 − [αv]2 − γv.

Protocol ΠSOCsh (U, v) and ΠSOC

rec (U, JvK)

Fig. 6: 3PC: Input Sharing and Output Reconstruction in SOC Setting

8

Page 9: SWIFT: Super-fast and Robust Privacy-Preserving Machine ...Nishat Koti , Mahak Pancholi , Arpita Patra , Ajith Suresh Department of Computer Science and Automation, Indian Institute

b) MSB Extraction, Bit to Arithmetic Conversion andBit Injection Protocols: Here we provide a high-level overviewof three protocols that involve working over arithmetic andboolean rings in a mixed fashion and are used for the PPMLprimitives. The Bit Extraction Protocol, Πbitext allows serversto compute the boolean sharing of the most significant bit(msb) of a value v given its arithmetic sharing JvK. The Bit2AProtocol, Πbit2A, given the boolean sharing of a bit b, denotedas JbKB, allows servers to compute the arithmetic sharing JbRK.Here bR denotes the equivalent value of b over ring Z2` (seeNotation II.2). Lastly, the Bit Injection Protocol, ΠBitInj, allowsthe servers to compute the arithmetic sharing JbvK given theboolean sharing of a bit b, denoted as JbKB and the arithmeticsharing of v ∈ Z2` .

The core techniques used in these protocols follow fromBLAZE [9], where the multiplication calls are instantiated withΠmult, and several private communications are replaced withjmp-send to ensure either successful run or TTP selection. ThePPML building blocks above can be understood without thedetails of the constructs and hence they are moved to §B-F.

c) Dot Product: Given the J·K-sharing of vectors ~x and~y, protocol Πdotp (Fig. 7) allows servers to generate J·K-sharingof z = ~x� ~y in a robust fashion. By J·K-sharing of a vector ~xof size n, we mean that each element xi ∈ Z2` in the vector,for i ∈ [n], is J·K-shared. We borrow ideas from BLAZE forobtaining an online communication cost independent of n anduse jmp primitive to ensure either success or TTP selection.Similar to our multiplication protocol that offloads one call toa robust multiplication protocol in the preprocessing, our dotproduct does the same for a robust dot product.

Preprocessing:

– P0, Pj , for j ∈ {1, 2}, together sample random [αz]j ∈ Z2` ,while P1, P2 sample random γz ∈ Z2` .

– Servers locally compute 〈·〉-sharing of ~d,~e with di = γxi +αxi

and ei = γyi + αyi for i ∈ [n] as follows:

(〈di〉0=([αxi ]2 , [αxi ]1), 〈di〉1=([αxi ]1 , γxi), 〈di〉2=(γxi , [αxi ]2))

(〈ei〉0=([αyi ]2 , [αyi ]1), 〈ei〉1=([αyi ]1 , γyi), 〈ei〉2=(γyi , [αyi ]2))

– Servers execute ΠdotpPre(P, 〈~d〉, 〈~e〉) to generate 〈f〉 = 〈~d�~e〉.– P0, P1 locally set [χ]1 = f1, while P0, P2 locally set [χ]2 = f0.P1, P2 locally compute ψ = f2 −

∑ni=1 γxiγyi .

Online:

– P0, Pj , for j ∈ {1, 2}, compute [β?z ]j = −

∑ni=1((βxi +

γxi) [αyi ]j + (βyi + γyi) [αxi ]j) + [αz]j + [χ]j .– P0, P1 jmp-send [β?

z ]1 to P2 and P0, P2 jmp-send [β?z ]2 to P1.

– P1, P2 locally compute β?z = [β?

z ]1 + [β?z ]2 and set βz = β?

z +∑ni=1(βxiβyi) + ψ.

– P1, P2 jmp-send βz + γz to P0.

Protocol Πdotp(P, {JxiK, JyiK}i∈[n])

Fig. 7: 3PC: Dot Product Protocol (z = ~x� ~y)

To begin with, z = ~x�~y can be viewed as n parallel mul-tiplication instances of the form zi = xiyi for i ∈ {1, . . . , n},followed by adding up the results. Let β?z =

∑ni=1 β

?zi . Then,

β?z = −n∑i=1

(βxi + γxi)αyi −n∑i=1

(βyi + γyi)αxi + αz + χ (3)

where χ =∑ni=1(γxiαyi + γyiαxi + Γxiyi − ψi).

Apart from the aforementioned modification, the onlinephase for dot product proceeds similar to that of multipli-cation protocol. P0, P1 locally compute [β?z ]1 as per Eq. 3and jmp-send [β?z ]1 to P2. P1 obtains [β?z ]2 in a similarfashion. P1, P2 reconstruct β?z = [β?z ]1 + [β?z ]2 and computeβz = β?z +

∑ni=1 βxiβyi + ψ. Here, the value ψ has to be

correctly generated in the preprocessing phase satisfying Eq.3. Finally, P1, P2 jmp-send βz + γz to P0.

We now provide the details for preprocessing phase thatenable servers to obtain the required values (χ, ψ) with theinvocation of a dot product protocol in a black-box way.Towards this, let ~d = [d1, . . . , dn] and ~e = [e1, . . . , en], wheredi = γxi + αxi and ei = γyi + αyi for i ∈ [n], as in the caseof multiplication. Then for f = ~d� ~e,

f = ~d� ~e =

n∑i=1

diei =

n∑i=1

(γxi + αxi)(γyi + αyi)

=

n∑i=1

(γxiγyi + ψi) +

n∑i=1

χi =

n∑i=1

(γxiγyi + ψi) + χ

=

n∑i=1

(γxiγyi + ψi) + [χ]1 + [χ]2 = f2 + f1 + f0.

where f2 =∑ni=1(γxiγyi + ψi), f1 = [χ]1 and f0 = [χ]2.

Using the above relation, the preprocessing phase proceedsas follows: P0, Pj for j ∈ {1, 2} sample a random [αz]j ∈Z2` , while P1, P2 sample random γz. Servers locally prepare〈~d〉, 〈~e〉 similar to that of multiplication protocol. Serversthen execute a robust 3PC dot product protocol, denoted byΠdotpPre, that takes 〈~d〉, 〈~e〉 as input and compute 〈f〉 withf = ~d � ~e. Given 〈f〉, the ψ and [χ] values are extracted asfollows (ref. Eq. 4):

ψ = f2 −n∑i=1

γxiγyi , [χ]1 = f1, [χ]2 = f0, (4)

It is easy to see from the semantics of 〈·〉-sharing that bothP1, P2 obtain f2 and hence ψ. Similarly, both P0, P1 obtain f1and hence [χ]1, while P0, P2 obtain [χ]2.

In this work, we instantiate ΠdotpPre using n black-boxinvocations of ΠmulPre. In the ith invocation, servers compute〈di, ei〉 and obtain the respective ψi and [χi] values. Finally,servers locally set ψ =

∑ni=1 ψi and [χ] =

∑ni=1 [χi].

The aforementioned method results in communication of 3nelements in the preprocessing phase. A protocol for ΠdotpPre

with better cost, when plugged into our Πdotp, would bringdown the preprocessing cost further while maintaining a fastonline phase. We defer the formal details of Πdotp to §B-G.

d) Truncation: Working over fixed-point values, re-peated multiplications using FPA arithmetic can lead to anoverflow resulting in loss of significant bits of information.This put forth the need for truncation [1], [4], [6], [7], [9] thatre-adjusts the shares after multiplication so that FPA semanticsare maintained. As shown in SecureML [1], the method oftruncation would result in loss of information on the leastsignificant bits and affect the accuracy by a very minimalamount only.

For truncation, servers execute Πtrgen (Fig. 8) to generate arandom pair of the form ([r] , JrdK). Here, r denotes a random

9

Page 10: SWIFT: Super-fast and Robust Privacy-Preserving Machine ...Nishat Koti , Mahak Pancholi , Arpita Patra , Ajith Suresh Department of Computer Science and Automation, Indian Institute

ring element, while rd represents the truncated value of r. Bytruncated value, we mean that the value is right-shifted byd bit positions, where d is the number of bits allocated forthe fractional part in the FPA representation. Given (r, rd), thetruncated value of v denoted by vd can be computed from vas vd = (v − r)d + rd. As shown in ABY3 [4], this methodensures the same correctness of SecureML and the accuracyis affected by a very minimal amount.

To generate ([r] , JrdK), servers proceed as follows: P0, Pjfor j ∈ {1, 2} sample random Rj ∈ Z2` . P0 locally computesr = R1+R2 and truncates r to obtain rd. P0 then executes Πsh

on rd to generate JrdK. As shown in BLAZE, the correctnessof sharing performed by P0 is checked using the relationr = 2drd + rd, where rd denotes the ring element r with thehigher order ` − d bit positions set to 0. In detail, P0, Pj forj ∈ {1, 2} locally computes [aj ] for a = (r−2drd+rd). P0, P1

then jmp-send H([a]1) to P2. P2 checks if the received hashvalue matches with H(− [a]2). In case of any inconsistency,P2 accuses P0 and then P1 is identified as the TTP. Thecorrectness of Πtrgen follows from BLAZE.

– P0, Pj for j ∈ {1, 2} together sample random Rj ∈ Z2` . P0

sets r = R1 + R2 while Pj sets [r]j = Rj . Pj sets [rd]j as thering element that has last d bits of rj in the last d positions and0 elsewhere.

– P0 locally truncates r to obtain rd and executes Πsh(P0, rd) to

generate JrdK.– P0, P1 set

[rd]1

= βrd − [αrd ]1, while P0, P2 set[rd]2

=− [αrd ]2.

– P0, P1 compute u = [r]1 − 2d[rd]1− [rd]1. P0, P1 jmp-send

H(u)) to P2.– P2 locally computes v = 2d

[rd]2+[rd]2−[r]2. If H(u) 6= H(v),

P2 broadcast "(accuse,P0)" and P1 is chosen as the TTP.

Protocol Πtrgen(P)

Fig. 8: 3PC: Generating Random Truncated Pair (r, rd)

e) Dot Product with Truncation: Given the J·K-sharingof vectors ~x and ~y, protocol Πdotpt (Fig. 20) allows serversto generate JzdK, where zd denotes the truncated value of z =~x � ~y. One naive way is to compute the dot product usingΠdotp first, followed by performing the truncation using the(r, rd) pair. Instead, we follow the optimization of BLAZEwhere the online phase of Πdotp is modified to integrate thetruncation using (r, rd) at no additional cost.

The preprocessing phase now consists of the executionof one instance of Πtrgen (Fig. 8) and the preprocessingcorresponding to Πdotp (Fig. 7). At a high level, the on-line phase proceeds as follows: P0, Pj for j ∈ {1, 2} lo-cally compute [z? − r]j (instead of [β?z ]j as in Πdotp) wherez? = β?z − αz. P0, P1 jmp-send [z? − r]1 to P2 while P0, P2

jmp-send [z? − r]2 to P1. Both P1, P2 then compute (z − r)

locally, truncate it to obtain (z− r)d and execute Πjsh togenerate J(z− r)dK. Finally, servers locally compute the resultas JzdK = J(z− r)dK+ JrdK. We defer the formal details of theprotocol Πdotpt to §B-I.

f) Secure Comparison: Secure comparison allowsservers to check whether x < y, given their J·K-shares. InFPA representation, checking x < y is equivalent to checkingthe msb of v = x − y. Towards this, servers locally compute

JvK = JxK−JyK and extract the msb of v using Πbitext (§B-F1).In case an arithmetic sharing is desired, servers can applyΠbit2A (Fig. 18) protocol on the outcome of Πbitext protocol.

g) Activation Functions: We now elaborate on two ofthe most prominently used activation functions: i) RectifiedLinear Unit (ReLU) and (ii) Sigmoid (Sig).

– ReLU: The ReLU function, relu(v) = max(0, v), can beviewed as relu(v) = b · v, where the bit b = 1 if v < 0 and0 otherwise. Here b denotes the complement of b. Given JvK,servers first execute Πbitext on JvK to generate JbKB. The J·KB-sharing of b is then locally computed by setting βb = 1⊕ βb.Servers then execute ΠBitInj protocol on JbKB and JvK to obtainthe desired result.

– Sig: In this work, we use the MPC-friendly variant of theSigmoid function [1], [4], [6] (ref. §B-J). Note that sig(v) =b1b2(v+1/2)+b2, where b1 = 1 if v+1/2 < 0 and b2 = 1 ifv−1/2 < 0. To compute Jsig(v)K, servers proceed in a similarfashion as the ReLU, and hence, we skip the formal details.

IV. ROBUST 4PC AND PPML

In this section, we extend our 3PC results to the 4-party case and observe substantial efficiency gain. First, theuse of broadcast is eliminated. Second, the preprocessingof multiplication becomes substantially computationally light,eliminating the multiplication protocol altogether. Third andthe most striking of all, we achieve a dot product protocolwith communication cost completely independent of the sizeof the vector, as opposed to its 3PC counterpart (cf. Πdotp (Fig.7)). At the heart of our 4PC constructions lies an efficient 4-party jmp primitive, denoted as jmp4, that allows two serversto robustly send a common value to a third server. We startwith the secret-sharing semantics for 4 parties. We only usean extended version of J·K-sharing defined below.

a) Secret Sharing Semantics: For a value v, the sharesfor P0, P1 and P2 remain the same as that of 3PC case. Thatis, P0 holds ([αv]1 , [αv]2 , βv + γv) while Pi for i ∈ {1, 2}holds ([αv]i , βv, γv). The shares for the fourth server P3 isdefined as ([αv]1 , [αv]2 , γv). Clearly, the secret is defined asv = βv − [αv]1 − [αv]2.

b) 4PC Joint Message Passing Primitive: The jmp4primitive enables two servers Pi, Pj to send a common valuev ∈ Z2` to a third server Pk, or identify a TTP in case ofany inconsistency. This primitive is analogous to jmp (Fig. 2)in spirit but is significantly optimized and free from broadcastcalls. Similar to the 3PC counterpart, each party maintainsa bit and Pi sends the value, and Pj the hash of it to Pk.Pk sets its inconsistency bit to 1 when the (value, hash) pairis inconsistent. This is followed by relaying the bit back tothe senders who exchange it to reach an agreement on theconsistency bit. During this execution, if a party remains silent,then it triggers the recipient to turn on its bit. In case of anyinconsistency, the fourth server, who was not a part of thecomputation, can be employed as the TTP. However, reachingagreement on whether the TTP is established or the protocolterminates successfully needs extra care. For this, Pi, Pj , Pksend their bits to Pl who accepts to be a TTP when at leasttwo parties’ bits are turned on including Pk’s. Pl relays aconfirmation to all and a server accepts her if its inconsistency

10

Page 11: SWIFT: Super-fast and Robust Privacy-Preserving Machine ...Nishat Koti , Mahak Pancholi , Arpita Patra , Ajith Suresh Department of Computer Science and Automation, Indian Institute

bit is on and it receives the confirmation from Pl. Notice thatan honest Pl when it relays a confirmation will always beaccepted and a corrupt Pl’s deliberate attempt to become TTPwill always be rejected.

– Ps for s ∈ {i, j, k} initializes an inconsistency bit bs = 0.– Pi sends v to Pk and Pj sends H(v) to Pk. Pk sets bk = 1 if the

received values are inconsistent, or if either Pi or Pj remainedsilent.

– Pk sends bk to Pi, Pj . Pi sets bi to bk and to 1 when nothingis received from Pk. Similarly, for Pj .

– Pi, Pj mutually exchange their bits. Pi resets bi = bi ∨ bj

where bj is set to 1 when Pj remains silent. Analogously for Pj .– Ps for s ∈ {i, j, k} sends bs to Pl. If Pl receives 1 from at

least two servers among which one is Pk, Pl sets bl = 1 and to0 otherwise. It sends bl to all.

– Ps, for s ∈ {i, j, k}, sets TTP = Pl if bs ∧ bl = 1, terminatesotherwise.

Protocol Πjmp4(Pi, Pj , Pk, v, Pl)

Fig. 9: 4PC: Joint Message Passing Primitive

Notation IV.1. We say that Pi, Pj jmp4-send v to Pk whenthey invoke Πjmp4(Pi, Pj , Pk, v, Pl).

We note that the end goal of jmp4 primitive relates closelyto the bi-convey primitive of FLASH [7]. Bi-convey allowstwo servers S1, S2 to convey a value to a server R, and incase of an inconsistency, a pair of honest servers mutuallyidentify each other, followed by exchanging their internalrandomness to recover the clear inputs, computing the circuit,and sending the output to all. Note, however, that jmp4 prim-itive is more efficient and differs significantly in techniquesfrom the bi-convey primitive. Unlike in bi-convey, in caseof an inconsistency, jmp4 enables servers to unanimouslylearn the TTP’s identity. Moreover, bi-convey demands thathonest servers, identified during an inconsistency, exchangetheir internal randomness (which comprises of the shared keysestablished during the key-setup phase) to proceed with thecomputation. This enforces the need for a fresh key-setup everytime inconsistency is detected. In the efficiency front, jmp4simply halves the communication cost of bi-convey, giving a2× improvement.

A. 4PC Protocols

In this section, we revisit the protocols from 3PC (§III)and suggest optimizations. While we provide details for theprotocols that vary significantly from their 3PC counterpart inthis section, the details for other protocols are deferred to §C.

a) Sharing Protocol: To enable Pi to share a valuev, protocol Πsh4 (Fig. 10) proceeds similar to that of 3PCcase with the addition that P3 also samples the values[αv]1 , [αv]2 , γv using the shared randomness with the respec-tive servers. On a high level, Pi computes βv = v+[αv]1+[αv]2and sends βv (or βv + γv) to another server and they togetherjmp4-send this information to the intended servers.

Preprocessing:

– If Pi = P0 : P0, P3, Pj , for j ∈ {1, 2}, together sample

Protocol Πsh4(Pi, v)

random [αv]j ∈ Z2` , while P sample random γv ∈ Z2` .– If Pi = P1 : P0, P3, P1 together sample random [αv]1 ∈ Z2` ,

while P sample a random [αv]2 ∈ Z2` . Also, P1, P2, P3 samplerandom γv ∈ Z2` .

– If Pi = P2: Analogous to the case when Pi = P1.– If Pi = P3: P0, P3, Pj , for j ∈ {1, 2}, sample random[αv]j ∈ Z2` . P1, P2, P3 together sample random γv ∈ Z2` .

Online:

– If Pi = P0 : P0 computes βv = v + αv and sends βv to P1.P0, P1 jmp4-send βv to P2.

– If Pi = Pj , for j ∈ {1, 2} : Pj computes βv = v + αv, sendsβv to P3−j . P1, P2 jmp4-send βv + γv to P0.

– If Pi = P3: P3 sends βv + γv = v + αv + γv to P0. P3, P0

jmp4-send βv + γv to both P1 and P2.

Fig. 10: 4PC: Generating JvK-shares by server Pi

b) Multiplication Protocol: Given the J·K-shares of xand y, protocol Πmult4 (Fig. 11) allows servers to computeJzK with z = xy. When compared with the state-of-the-art4PC GOD protocol of FLASH [7], our solution improvescommunication in both, the preprocessing and online phase,from 6 to 3 ring elements. Moreover, our communication costmatches with the state-of-the-art 4PC protocol of Trident [8]that provides security with fairness only.

Preprocessing:

– P0, P3, Pj , for j ∈ {1, 2}, sample random [αz]j ∈ Z2` , whileP0, P1, P3 sample random [Γxy]1 ∈ Z2` .

– P1, P2, P3 sample random γz, ψ, r ∈ Z2` and set [ψ]1 = r,[ψ]2 = ψ − r.

– P0, P3 set [Γxy]2 = Γxy − [Γxy]1, where Γxy = αxαy. P0, P3

jmp4-send [Γxy]2 to P2.– P3, Pj , for j ∈ {1, 2}, set [χ]j = γx [αy]j + γy [αx]j + [Γxy]j −[ψ]j . P1, P3 jmp4-send [χ]1 to P0, while P2, P3 jmp4-send [χ]2to P0.

Online:

– P0, Pj , for j ∈ {1, 2}, compute [β?z ]j = −(βx + γx) [αy]j −

(βy + γy) [αx]j + [αz]j + [χ]j .– P1, P0 jmp4-send [β?

z ]1 to P2, while P2, P0 jmp4-send [β?z ]2

to P1.– Pj , for j ∈ {1, 2}, computes β?

z = [β?z ]1 + [β?

z ]2 and setsβz = β?

z + βxβy + ψ.– P1, P2 jmp4-send βz + γz to P0.

Protocol Πmult4(P, JxK, JyK)

Fig. 11: 4PC: Multiplication Protocol (z = x · y)

Recall that the goal of preprocessing in 3PC multiplicationwas to enable P1, P2 obtain ψ, and P0, Pi for i ∈ {1, 2} obtain[χ]i where χ = γxαy+γyαx+Γxy−ψ. Here ψ is a random valueknown to both P1, P2. With the help of P3, we let the serversobtain the respective preprocessing data as follows: P0, P3, P1

together samples random [Γxy]1 ∈ Z2` . P0, P3 locally computeΓxy = αxαy, set [Γxy]2 = Γxy − [Γxy]1 and jmp4-send [Γxy]2to P2. P1, P2, P3 locally sample ψ, r and generate [·]-sharesof ψ by setting [ψ]1 = r and [ψ]2 = ψ − r. Then Pj , P3

for j ∈ {1, 2} compute [χ]j = γx [αy]j + γy [αx]j + [Γxy]j −[ψ]j and jmp4-send [χ]j to P0. The online phase is similarto that of 3PC, apart from Πjmp4 being used instead of Πjmp

for communication. Since P3 is not involved in the online

11

Page 12: SWIFT: Super-fast and Robust Privacy-Preserving Machine ...Nishat Koti , Mahak Pancholi , Arpita Patra , Ajith Suresh Department of Computer Science and Automation, Indian Institute

computation phase, we can safely assume P3 to serve as theTTP for the Πjmp4 executions in the online phase.

c) Reconstruction Protocol: Given JvK, protocol Πrec4

(Fig. 12) enables servers to robustly reconstruct the value vamong the servers. Note that every server lacks one sharefor reconstruction and the same is available with three otherservers. Hence, they communicate the missing share amongthemselves, and the majority value is accepted. As an optimiza-tion, two among the three servers can send the missing sharewhile the third one can send a hash of the same for verification.Notice that, as opposed to the 3PC case, this protocol does notrequire commitments.

Online

– P0 receives γv from P1, P2 and H(γv) from P3.– P1 receives [αv]2 from P2, P3 and H([αv]2) from P0.– P2 receives [αv]1 from P0, P3 and H([αv]1) from P1.– P3 receives βv + γv from P0, P1 and H(βv + γv) from P2.– Pi ∈ P selects the missing share forming the majority among

the values received and reconstructs the output.

Protocol Πrec4(P, JvK)

Fig. 12: 4PC: Reconstruction of v among the servers

d) Input Sharing and Output Reconstruction in SOCSetting: We extend input sharing and reconstruction in theSOC setting as follows. To generate J·K-shares for its input v, Ureceives each of the shares [αv]1 , [αv]2, and γv from three outof the four servers as well as a random value r ∈ Z2` sampledtogether by P0, P1, P2 and accepts the values that form themajority. U locally computes u = v + [αv]1 + [αv]2 + γv + rand sends u to all the servers. Servers then execute a tworound byzantine agreement (BA) [47] to agree on u (or ⊥).On successful completion of BA, P0 computes βv +γv from uwhile P1, P2 compute βv from u locally. For the reconstructionof a value v, servers send their J·K-shares of v to U, whoselects the majority value for each share and reconstructs theoutput. At any point, if a TTP is identified, the servers proceedas follows. All servers send their J·K-share of the input tothe TTP. TTP picks the majority value for each share andcomputes the function output. It then sends this output to U.U also receives the identity of the TTP from all servers andaccepts the output received from the TTP forming majority.

e) Dot Product: Given J·K-shares of two n-sized vectors~x, ~y, protocol Πdotp4 (Fig. 26) enables servers to computeJzK with z = ~x � ~y. The protocol is essentially similar to ninstances of multiplications of the form zi = xiyi for i ∈ [n].But instead of communicating values corresponding to eachof the n instances, servers locally sum up the shares andcommunicate a single value. This technique helps to obtaina communication cost independent of the size of the vectors.

During the preprocessing phase, similar to the multipli-cation protocol P0, P1, P3 sample a random [Γ~x�~y]

1. P0, P3

compute Γ~x�~y =∑ni=1 αxiαyi and jmp4-send [Γ~x�~y]

2=

Γ~x�~y−[Γ~x�~y]1

to P2. P1, P2, P3 sample a random ψ, and gen-erate its [·]-shares locally. Servers P3, Pj for j ∈ {1, 2} thencompute [χ]j =

∑ni=1(γxi [αyi ]j+γyi [αxi ]j)+[Γ~x�~y]

j− [ψ]j ,

and jmp4-send [χ]j to P0.

During the online phase, P0, P1 first compute [β?z ]1

where β?z =∑ni=1 β

?zi directly as [β?z ]1 = −

∑ni=1((βxi +

γxi) [αyi ]1 + (βyi + γyi) [αxi ]1) + [αz]1 + [χ]1. Following this,P0, P1 jmp4-send [β?z ]1 to P2. P0, P2 proceed similarly toenable P1 obtain [β?z ]2. Finally, P1, P2 compute β?z = [β?z ]1 +[β?z ]2 followed by computing βz = β?z +

∑ni=1(βxiβyi) + ψ.

P1, P2 then jmp4-send βz + γz to P0.

V. APPLICATIONS AND BENCHMARKING

In this section, we empirically show the practicality ofour protocols for two widely used applications: BiometricMatching and PPML.

a) Benchmarking Environment: We use a 64-bit ring(Z264 ). The benchmarking is performed over a WAN that wasinstantiated using n1-standard-8 instances of Google Cloud4,with machines located in East Australia (P0), South Asia (P1),South East Asia (P2), and West Europe (P3). The machines areequipped with 2.3 GHz Intel Xeon E5 v3 (Haswell) processorssupporting hyper-threading, with 8 vCPUs, and 30 GB of RAMMemory and with a bandwidth of 50 Mbps. The average round-trip time (rtt) was taken as the time for communicating 1 KBof data between a pair of parties, and the rtt values were

P0-P1 P0-P2 P0-P3 P1-P2 P1-P3 P2-P3

151.40ms 59.95ms 275.02ms 92.94ms 173.93ms 219.37ms

b) Software Details: We implement our protocols5 us-ing the publicly available ENCRYPTO library [48] in C++17.We obtained the code of BLAZE and FLASH from therespective authors and executed them in our environment. Thecollision-resistant hash function was instantiated using SHA-256. We have used multi-threading and our machines werecapable of handling a total of 32 threads. Each experiment isrun for 20 times and the average values are reported.

A. Biometric Matching

Biometric computation is central to many real-worldtasks such as face recognition [49], [50] and fingerprint-matching [51], [52]. The objective is, given a database Dof m biometric samples stored as vectors (~s1, . . . ,~sm) eachof size n, and a user with its own sample ~u, identify the“closest” sample to ~u in D. This task can be accomplishedby considering various distance metrics, the most prominentof which is the Euclidean Distance (ED). In this work, weconsider ED as the metric, and hence the problem boils downto identifying a sample vector in D which has the least ED for~u. Note that, in our setting, each entry in the database D isJ·K-shared among the servers. The client with an input query~u generates J·K- shares of the same along with the servers.

Let xi denote the ith element in the vector ~x. As wasintroduced in [1], ED between two n length vectors ~x, ~yis computed as ED~x~y =

∑i=ni=1(xi − yi)

2 = ~z � ~z where~z = ((x1−y1), . . . , (xn−yn)). Hence, the servers first computeJ·K-shares for vector ~z locally, as JziK = JxiK − JyiK fori ∈ [n], followed by an execution of Πdotp on J~zK, J~zK. Forbiometric computation, the servers create a distance vector DV

4https://cloud.google.com/5The link to our code is not provided respecting the double-blinded

submission policy. The code will be made publicly available once the worksees formal acceptance.

12

Page 13: SWIFT: Super-fast and Robust Privacy-Preserving Machine ...Nishat Koti , Mahak Pancholi , Arpita Patra , Ajith Suresh Department of Computer Science and Automation, Indian Institute

x1 x2

y1

x3 x4 xm−1 xm

y2 ym/2

Fig. 13: Minimum Value - An example

by computing the ED between ~u and every sample vector ~siin D, i.e DVi = ED~u~si for i ∈ [m]. The next task now is tofind the minimum among the m values in DV.

Minimum among m values: Consider vector ~x =(x1,. . . , xm) of size m, where each element is J·K-sharedamong the servers. We follow the standard tree based approachto compute the minimum element. This is as follows. Firstthe elements of the vector are grouped into pairs, whichare then securely compared to find the pairwise minimum.For instance, J·K-shares of (x1, x2), (x3, x4), . . ., (xm−1,xm) are compared to obtain J·K-shares of y1, . . . , ym/2. Let~y = (y1, y2, . . . , ym/2). This process is recursively applied on~y, until a single element is obtained. This requires O (log(m))rounds of recursion to obtain the minimum value in ~x. Notethat the minimum of any two elements, say x1, x2 can becomputed as y1 = b ·(x1−x2)+x2, where b = 0 if x1 > x2, or1, otherwise. This can be achieved using one invocation of bitextraction protocol Πbitext on (x1 − x2) to obtain J·KB-sharesof b, followed by one execution of bit injection ΠBitInj on bB

and (x1 − x2).

Setting Ref.

m = 1024 m = 16384

Pre. Online Pre. Online

Com [KB] R Com [KB] C [KB] R Com [KB]

3PC BLAZE 1127.1 102 151.1 18036.0 142 2419.9SWIFT 1128.3 103 151.9 18037.8 143 2420.7

4PC FLASH 223.9 71 239.8 3583.8 99 3839.8SWIFT 127.1 71 103.2 2035.9 99 1651.9

TABLE III: Minimum ED distance. The values are reported forbiometric samples of size 40.

Table III presents the benchmarking for biometric matchingover 3PC and 4PC setting. Following SecureML [1], we chosethe size of the biometric sample n to be 40. As is evident fromthe Table III, in 3PC, we incur a minimal loss in performanceover BLAZE but guarantee the security of GOD instead offairness. For the case of 4PC, we observe ≈ 2× improvementover the state-of-the-art protocol of FLASH [7] in terms ofcommunication cost.

B. Privacy-preserving Machine Learning

We consider training and inference for Linear Regressionand Logistic Regression and inference for Neural Networks(NN). As pointed out in BLAZE, NN training requires ad-ditional tools to allow mixed world computations, which weleave as future work. We refer readers to SecureML [1],ABY3 [4], and BLAZE [9] for a detailed description ofthe training and inference steps for the aforementioned MLalgorithms. All our benchmarking is done over the publiclyavailable MNIST [53] dataset that has n = 784 features. Fortraining, we used a batch size of B = 128.

In 3PC, we compare our results against the best-knownframework BLAZE in this setting that provides fairness. Ourresults imply that we get GOD at no additional cost comparedto BLAZE. For 4PC, we compare our results with two best-known works FLASH [7] (which is robust) and Trident [8](which is fair). Our results halve the cost of FLASH and areon par with Trident.

1) Benchmarking Parameter: We use throughput (TP) asthe benchmarking parameter following BLAZE and ABY3 [4]as it would help to analyse the effect of improved communica-tion and round complexity in a single shot. Here, TP denotesthe number of operations (“iterations” for the case of trainingand “queries” for the case of inference) that can be performedin unit time. We consider minute as the unit time since mostof our protocols over WAN requires more than a second tocomplete. An iteration in ML training consists of a forwardpropagation phase followed by a backward propagation phase.In the former phase, servers compute the output from the inputswhile in the latter, the model parameters are adjusted accordingto the difference in the computed output and the actual output.The inference can be viewed as one forward propagation ofthe algorithm alone.

2) Logistic Regression: In Logistic Regression, one itera-tion comprises updating the weight vector ~w using the gradientdescent algorithm (GD). It is updated according to the functiongiven below: ~w = ~w− α

BXTi ◦ (sig(Xi ◦ ~w)−Yi) . where α

and Xi denote the learning rate, and a subset of batch size B,randomly selected from the entire dataset in the ith iteration,respectively. The forward propagation comprises of computingthe value Xi ◦ ~w followed by an application of a sigmoidfunction on it. The weight vector is updated in the backwardpropagation, which internally requires the computation of aseries of matrix multiplications, and can be achieved using adot product. The update function can be computed using J·Kshares as: J~wK = J~wK− α

B JXTj K ◦ (sig(JXjK ◦ J~wK)− JYjK).

We summarize our results in Table IV.

Setting Ref.Pre. Online (TP in ×103)

Com [KB] Latency (s) Com [KB] TP

3PCTraining

BLAZE 4757.11 1.17 50.23 2525.36SWIFT 4757.29 1.23 50.31 2393.38

3PCInference

BLAZE 18.69 1.08 0.25 2728.65SWIFT 18.71 1.08 0.28 2727.38

4PCTraining

FLASH 99.09 1.22 88.84 1158.65SWIFT 51.36 1.22 41.23 2407.64

4PCInference

FLASH 0.39 1.05 0.41 2044.01SWIFT 0.21 1.05 0.18 2806.09

TABLE IV: Logistic Regression training and inference. TP is givenin (#it/min) for training and (#queries/min) for inference.

We observe that the online TP, for the case of 3PC,is slightly lower compared to that of BLAZE, though theamortized online communication cost is the same for both.This is because the total number of rounds for both training andinference phase of Logistic Regression is slightly higher in ourcase due to the additional rounds introduced by the verificationmechanism (aka verify phase which also needs broadcast). Thisgap becomes less evident for protocols with more number ofrounds, as is demonstrated in the case of NN (presented next),

13

Page 14: SWIFT: Super-fast and Robust Privacy-Preserving Machine ...Nishat Koti , Mahak Pancholi , Arpita Patra , Ajith Suresh Department of Computer Science and Automation, Indian Institute

where verification for several iterations is clubbed together,making the overhead for verification insignificant.

For the case of 4PC, our solution outperforms FLASH interms of communication as well as throughput. For the case oflogistic regression inference, the improvement over FLASH isnot 2× as claimed theoretically. This stems from the limitedprocessing power in our benchmarking environment and can beaddressed by increasing the processing capacity of the servers.Hence, to be fair, we limit the bandwidth to 34 Mbps andobtain a 2× improvement in TP over FLASH. At 34 Mbps,the TP of FLASH turns out to be 1389.93×103 #queries/min.This highlights the efficiency improvements with reducedcommunication over lower bandwidths. We now compare withTrident [8]. Here, we observe a drop of 12.49% in TP forinference and a drop of 10.91% in TP for training. This isdue to the extra rounds required for verification to achieveGOD. We point out that this drop becomes less significant forprotocols involving more number of rounds, as will be evidentfrom the comparisons for NN inference. Note, however, thatthe loss in TP is traded off with stronger security and thegain in saving the runtime of one server by ≈ 73% owing tothe presence of 2 active servers in our case, as opposed to 3in Trident, thereby resulting in monetary gains in the cloudsetting.

3) NN Inference: In this work, we consider a NN withtwo hidden layers, each consisting of 128 nodes each andan output layer with 10 nodes [4], [9]. Each of the layers isfully connected. Inference in NN requires several dot productcalls followed by an application of the ReLU function. Thisprocess will be carried out for each layer in a sequentialmanner. Table V summarises our benchmarking results for NNinference.

Setting Ref.Pre. Online (TP in ×103)

Com [MB] Latency (s) Com [MB] TP

3PCInference

BLAZE 351.70 2.81 4.91 26.40SWIFT 351.70 2.91 4.91 26.40

4PCInference

FLASH 7.79 2.71 7.79 14.21SWIFT 4.39 2.71 3.35 36.96

TABLE V: NN Inference. TP is given in (#queries/min).

As illustrated in Table V, the performance of our 3PCframework is comparable to BLAZE. In the 4PC setting, whencompared to Trident [8], we observe a minimal loss of 0.36%in the online throughput. The drop surfaces due to the extrarounds involved in our verification. However, as pointed outearlier, the difference in TP closes in due to the large numberof rounds required for computing NN inference, which resultsin amortizing the extra rounds required for verification. Asmentioned earlier, this loss is traded-off with the strongersecurity guarantee and saving in the runtime of one server by≈ 85% owing to the presence of 2 active servers. Moreover,we outperform FLASH in every aspect. This establishes thepractical relevance of our work.

VI. CONCLUSION

In this work, we presented an efficient framework forPPML that achieves the strongest security of GOD or ro-bustness. Our 3PC protocol builds upon the recent work of

BLAZE [9] and achieves almost similar performance albeitimproving the security guarantee. For the case of 4PC, weoutperform the best-known– (a) robust protocol of FLASH [7]by 2× performance-wise and (b) fair protocol of Trident [8]by uplifting its security.

We leave the problem of extending our framework tosupport mixed-world conversions as well as to design protocolsto support algorithms like Decision Trees, k-means Clusteringetc. as open problem. The problem of making the communi-cation cost of dot product entirely independent of the featuresize is another challenging question.

REFERENCES

[1] P. Mohassel and Y. Zhang, “Secureml: A system for scalable privacy-preserving machine learning,” in IEEE S&P, 2017, pp. 19–38.

[2] E. Makri, D. Rotaru, N. P. Smart, and F. Vercauteren, “EPIC: efficientprivate image classification (or: Learning from the masters),” in CT-RSA, 2019, pp. 473–492.

[3] M. S. Riazi, C. Weinert, O. Tkachenko, E. M. Songhori, T. Schneider,and F. Koushanfar, “Chameleon: A hybrid secure computation frame-work for machine learning applications,” in AsiaCCS, 2018, pp. 707–721.

[4] P. Mohassel and P. Rindal, “ABY3: A mixed protocol framework formachine learning,” in ACM CCS, 2018, pp. 35–52.

[5] S. Wagh, D. Gupta, and N. Chandran, “Securenn: 3-party securecomputation for neural network training,” PoPETs, pp. 26–49, 2019.

[6] H. Chaudhari, A. Choudhury, A. Patra, and A. Suresh, “ASTRA:High Throughput 3PC over Rings with Application to SecurePrediction,” in ACM CCSW@CCS, 2019. [Online]. Available: https://eprint.iacr.org/2019/429

[7] M. Byali, H. Chaudhari, A. Patra, and A. Suresh, “FLASH: fast androbust framework for privacy-preserving machine learning,” PETS,2020. [Online]. Available: https://eprint.iacr.org/2019/1365

[8] H. Chaudhari, R. Rachuri, and A. Suresh, “Trident: Efficient 4PCFramework for Privacy Preserving Machine Learning,” NDSS, 2020.[Online]. Available: https://arxiv.org/abs/1912.02631

[9] A. Patra and A. Suresh, “BLAZE: Blazing Fast Privacy-PreservingMachine Learning,” NDSS, 2020. [Online]. Available: https://eprint.iacr.org/2020/042

[10] I. Damgard, V. Pastro, N. P. Smart, and S. Zakarias, “Multipartycomputation from somewhat homomorphic encryption,” in CRYPTO,2012, pp. 643–662.

[11] I. Damgard, M. Keller, E. Larraia, V. Pastro, P. Scholl, and N. P. Smart,“Practical covertly secure MPC for dishonest majority - or: Breakingthe SPDZ limits,” in ESORICS, 2013, pp. 1–18.

[12] M. Keller, P. Scholl, and N. P. Smart, “An architecture for practicalactively secure MPC with dishonest majority,” in ACM CCS, 2013, pp.549–560.

[13] M. Keller, E. Orsini, and P. Scholl, “MASCOT: faster maliciousarithmetic secure computation with oblivious transfer,” in ACM CCS,2016, pp. 830–842.

[14] C. Baum, I. Damgard, T. Toft, and R. W. Zakarias, “Better preprocessingfor secure multiparty computation,” in ACNS, 2016, pp. 327–345.

[15] I. Damgard, C. Orlandi, and M. Simkin, “Yet another compiler for activesecurity or: Efficient MPC over arbitrary rings,” in CRYPTO, 2018, pp.799–829.

[16] R. Cramer, I. Damgard, D. Escudero, P. Scholl, and C. Xing, “Spdz2k :

Efficient MPC mod 2k for dishonest majority,” in CRYPTO, 2018, pp.769–798.

[17] M. Keller, V. Pastro, and D. Rotaru, “Overdrive: Making SPDZ greatagain,” in EUROCRYPT, 2018, pp. 158–189.

[18] D. Bogdanov, S. Laur, and J. Willemson, “Sharemind: A framework forfast privacy-preserving computations,” in ESORICS, 2008, pp. 192–206.

[19] R. Cramer, S. Fehr, Y. Ishai, and E. Kushilevitz, “Efficient multi-partycomputation over rings,” in EUROCRYPT, 2003, pp. 596–613.

14

Page 15: SWIFT: Super-fast and Robust Privacy-Preserving Machine ...Nishat Koti , Mahak Pancholi , Arpita Patra , Ajith Suresh Department of Computer Science and Automation, Indian Institute

[20] D. Demmler, T. Schneider, and M. Zohner, “ABY - A framework forefficient mixed-protocol secure two-party computation,” in NDSS, 2015.

[21] I. Damgard, D. Escudero, T. K. Frederiksen, M. Keller, P. Scholl, andN. Volgushev, “New primitives for actively-secure MPC over rings withapplications to private machine learning,” IEEE S&P, 2019.

[22] T. Araki, J. Furukawa, Y. Lindell, A. Nof, and K. Ohara, “High-throughput semi-honest secure three-party computation with an honestmajority,” in ACM CCS, 2016, pp. 805–817.

[23] T. Araki, A. Barak, J. Furukawa, T. Lichter, Y. Lindell, A. Nof,K. Ohara, A. Watzman, and O. Weinstein, “Optimized honest-majorityMPC for malicious adversaries - breaking the 1 billion-gate per secondbarrier,” in IEEE S&P, 2017, pp. 843–862.

[24] R. Cleve, “Limits on the security of coin flips when half the processorsare faulty (extended abstract),” in ACM STOC, 1986, pp. 364–369.

[25] P. Mohassel, M. Rosulek, and Y. Zhang, “Fast and secure three-partycomputation: The garbled circuit approach,” in ACM CCS, 2015, pp.591–602.

[26] J. Furukawa, Y. Lindell, A. Nof, and O. Weinstein, “High-throughputsecure three-party computation for malicious adversaries and an honestmajority,” in EUROCRYPT, 2017, pp. 225–255.

[27] M. Byali, A. Joseph, A. Patra, and D. Ravi, “Fast secure computationfor small population over the internet,” in ACM CCS, 2018, pp. 677–694.

[28] P. S. Nordholt and M. Veeningen, “Minimising communication inhonest-majority MPC by batchwise multiplication verification,” inACNS, 2018, pp. 321–339.

[29] D. Boneh, E. Boyle, H. Corrigan-Gibbs, N. Gilboa, and Y. Ishai,“Zero-knowledge proofs on secret-shared data via fully linear pcps,”in CRYPTO, 2019, pp. 67–97.

[30] E. Boyle, N. Gilboa, Y. Ishai, and A. Nof, “Practical fully secure three-party computation via sublinear distributed zero-knowledge proofs,” inACM CCS, 2019, pp. 869–886.

[31] A. Patra and D. Ravi, “On the exact round complexity of secure three-party computation,” in CRYPTO, 2018, pp. 425–458.

[32] M. Byali, C. Hazay, A. Patra, and S. Singla, “Fast actively secure five-party computation with security beyond abort,” in ACM CCS, 2019, pp.1573–1590.

[33] K. Chida, D. Genkin, K. Hamada, D. Ikarashi, R. Kikuchi, Y. Lindell,and A. Nof, “Fast large-scale honest-majority MPC for maliciousadversaries,” in CRYPTO, 2018, pp. 34–64.

[34] M. Abspoel, A. P. K. Dalskov, D. Escudero, and A. Nof, “An efficientpassive-to-active compiler for honest-majority MPC over rings,” Cryp-tology ePrint Archive, Report 2019/1298, 2019, https://eprint.iacr.org/2019/1298.

[35] H. Eerikson, M. Keller, C. Orlandi, P. Pullonen, J. Puura, andM. Simkin, “Use your brain! arithmetic 3pc for any modulus withactive security,” Cryptology ePrint Archive, Report 2019/164, 2019,https://eprint.iacr.org/2019/164.

[36] S. D. Gordon, S. Ranellucci, and X. Wang, “Secure computation withlow communication from cross-checking,” in ASIACRYPT, 2018, pp.59–85.

[37] Y. Lindell and B. Pinkas, “Privacy preserving data mining,” J. Cryptol-ogy, pp. 177–206, 2002.

[38] W. Du and M. J. Atallah, “Privacy-preserving cooperative scientificcomputations,” in IEEE CSFW-14, 2001, pp. 273–294.

[39] A. P. Sanil, A. F. Karr, X. Lin, and J. P. Reiter, “Privacy preservingregression modelling via distributed computation,” in ACM SIGKDD,2004, pp. 677–682.

[40] G. Jagannathan and R. N. Wright, “Privacy-preserving distributed k-means clustering over arbitrarily partitioned data,” in ACM SIGKDD,2005, pp. 593–599.

[41] P. Bunn and R. Ostrovsky, “Secure two-party k-means clustering,” inACM CCS, 2007, pp. 486–497.

[42] H. Yu, J. Vaidya, and X. Jiang, “Privacy-preserving SVM classificationon vertically partitioned data,” in PAKDD, 2006, pp. 647–656.

[43] J. Vaidya, H. Yu, and X. Jiang, “Privacy-preserving SVM classification,”Knowl. Inf. Syst., pp. 161–178, 2008.

[44] A. B. Slavkovic, Y. Nardi, and M. M. Tibbits, “Secure logistic regres-sion of horizontally and vertically partitioned distributed databases,” inICDM, 2007, pp. 723–728.

[45] B. Pinkas, M. Rosulek, N. Trieu, and A. Yanai, “Spot-light: Lightweightprivate set intersection from sparse OT extension,” in CRYPTO, 2019,pp. 401–431.

[46] R. Cohen, I. Haitner, E. Omri, and L. Rotem, “Characterization ofsecure multiparty computation without broadcast,” J. Cryptology, pp.587–609, 2018.

[47] M. C. Pease, R. E. Shostak, and L. Lamport, “Reaching agreement inthe presence of faults,” J. ACM, pp. 228–234, 1980.

[48] Cryptography and P. E. G. at TU Darmstadt, “ENCRYPTO Utils,” https://github.com/encryptogroup/ENCRYPTO utils, 2017.

[49] Z. Erkin, M. Franz, J. Guajardo, S. Katzenbeisser, I. Lagendijk, andT. Toft, “Privacy-preserving face recognition,” in PETS, 2009, pp. 235–253.

[50] W. Henecka, S. Kogl, A. Sadeghi, T. Schneider, and I. Wehrenberg,“TASTY: tool for automating secure two-party computations,” in ACMCCS, 2010, pp. 451–462.

[51] M. Blanton and P. Gasti, “Secure and efficient protocols for iris andfingerprint identification,” in ESORICS, 2011, pp. 190–209.

[52] W. Henecka and T. Schneider, “Faster secure two-party computationwith less memory,” in ASIA CCS, 2013, pp. 437–446.

[53] Y. LeCun and C. Cortes, “MNIST handwritten digit database,” 2010.[Online]. Available: http://yann.lecun.com/exdb/mnist/

[54] G. Asharov and Y. Lindell, “A full proof of the BGW protocol forperfectly secure multiparty computation,” J. Cryptology, pp. 58–151,2017.

APPENDIX APRELIMINARIES

A. Shared Key Setup

Let F : {0, 1}κ×{0, 1}κ → X be a secure pseudo-randomfunction (PRF), with co-domain X being Z2` . The set of keysestablished between the servers for 3PC is as follows:

– One key shared between every pair– k01, k02, k12 for theparties (P0, P1), (P0, P2)and(P1, P2), respectively.

– One shared key known to all the servers– kP .

Suppose P0, P1 wish to sample a random value r ∈ Z2`

non-interactively. To do so they invoke Fk01(id01) and obtainr. Here, id01 denotes a counter maintained by the servers, andis updated after every PRF invocation. The appropriate keysused to sample is implicit from the context, from the identitiesof the pair that sample or from the fact that it is sampled byall, and, hence, is omitted.

Fsetup interacts with the servers in P and the adversary S. Fsetup

picks random keys kij for i, j ∈ {0, 1, 2} and kP . Let ys denotethe keys corresponding to server Ps. Then– ys = (k01, k02 and kP) when Ps = P0.– ys = (k01, k12 and kP) when Ps = P1.– ys = (k02, k12 and kP) when Ps = P2.

Output: Send (Output, ys) to every Ps ∈ P .

Functionality Fsetup

Fig. 14: 3PC: Ideal functionality for shared-key setup

The key setup is modelled via a functionality Fsetup (Fig.14) that can be realised using any secure MPC protocol.Analogously, the key setup functionality for 4PC is given inFig. 15.

15

Page 16: SWIFT: Super-fast and Robust Privacy-Preserving Machine ...Nishat Koti , Mahak Pancholi , Arpita Patra , Ajith Suresh Department of Computer Science and Automation, Indian Institute

Fsetup4 interacts with the servers in P and the adversary S. Fsetup4

picks random keys kij and kijk for i, j, k ∈ {0, 1, 2} and kP . Letys denote the keys corresponding to server Ps. Then– ys = (k01, k02, k03, k012, k013, k023 and kP) when Ps = P0.– ys = (k01, k12, k13, k012, k013, k123 and kP) when Ps = P1.– ys = (k02, k12, k23, k012, k023, k123 and kP) when Ps = P2.– ys = (k03, k13, k23, k013, k023, k123 and kP) when Ps = P3.

Output: Send (Output, ys) to every Ps ∈ P .

Functionality Fsetup4

Fig. 15: 4PC: Ideal functionality for shared-key setup

B. Collision Resistant Hash Function

Consider a hash function family H = K×L → Y . The hashfunction H is said to be collision resistant if, for all probabilis-tic polynomial-time adversaries A, given the description of Hkwhere k ∈R K, there exists a negligible function negl() suchthat Pr[(x1, x2) ← A(k) : (x1 6= x2) ∧ Hk(x1) = Hk(x2)] ≤negl(κ), where m = poly(κ) and x1, x2 ∈R {0, 1}m.

C. Commitment Scheme

Let Com(x) denote the commitment of a value x. The com-mitment scheme Com(x) possesses two properties; hiding andbinding. The former ensures privacy of the value v given just itscommitment Com(v), while the latter prevents a corrupt partyfrom opening the commitment to a different value x′ 6= x.The practical realization of a commitment scheme is via ahash function H() given below, whose security can be provedin the random-oracle model (ROM)– for (c, o) = (H(x||r),x||r) = Com(x; r).

APPENDIX B3PC PROTOCOLS

In this section, we provide a detailed communication costanalysis for our protocols in the 3PC setting. Also detailedinformation regarding some of the protocols are provided.

A. Joint Message Passing

Lemma B.1 (Communication). Protocol Πjmp (Fig. 2) requires1 round and an amortized communication of ` bits overall.

Proof: Server Pi sends value v to Pk while Pj sendshash of the same to Pk. This accounts for one round andcommunication of ` bits. Pk then sends back its inconsistencybit to Pi, Pj , who then exchange it; this takes another tworounds. This is followed by servers broadcasting hashes ontheir values and selecting a TTP based on it, which takesone more round. All except the first round can be combinedfor several instances of Πjmp protocol and hence the cost getsamortized.

B. Sharing Protocol

Lemma B.2 (Communication). Protocol Πsh (Fig. 3) is non-interactive in the preprocessing phase and requires 2 roundsand an amortized communication of 2` bits in the online phase.

Proof: During the preprocessing phase, servers non-interactively sample the [·]-shares of αv and γv values using

the shared key setup. In the online phase, when Pi = P0, itcomputes βv and sends it to P1, resulting in one round and` bits communicated. They then jmp-send βv to P2, whichrequires additional one round in an amortized sense, and `bits to be communicated. For the case when Pi = P1, it sendsβv to P2, resulting in one round and a communication of `bits. Then, P1, P2 jmp-send βv +γv to P0. This again requiresan additional one round and ` bits. The analysis is similar inthe case of Pi = P2.

C. Joint Sharing Protocol

The formal details for Πjsh protocol appears in Fig.16.

Preprocessing:

– If (Pi, Pj) = (P1, P0): Servers execute the preprocessing ofΠsh(P1, v) and then locally set γv = 0.

– If (Pi, Pj) = (P2, P0): Similar to the case above.– If (Pi, Pj) = (P1, P2): P1, P2 together sample random γv ∈Z2` . Servers locally set [αv]1 = [αv]2 = 0.

Online:

– If (Pi, Pj) = (P1, P0): P0, P1 compute βv = v +[αv]1 +[αv]2.P0, P1 jmp-send βv to P2.

– If (Pi, Pj) = (P2, P0): Similar to the case above.– If (Pi, Pj) = (P1, P2): P1, P2 locally set βv = v. P1, P2

jmp-send βv + γv to P0.

Protocol Πjsh(Pi, Pj , v)

Fig. 16: 3PC: J·K-sharing of a value v ∈ Z2` jointly by Pi, Pj

When the value v is available to both Pi, Pj in thepreprocessing phase, protocol Πjsh can be made non-interactivein the following way: P sample a random r ∈ Z2` and locallyset their share according to Table VI.

(P1, P2) (P1, P0) (P2, P0)

[αv]1 = 0, [αv]2 = 0

βv = v, γv = r − v

[αv]1 = −v, [αv]2 = 0

βv = 0, γv = r

[αv]1 = 0, [αv]2 = −v

βv = 0, γv = r

P0

P1

P2

(0, 0, r )

(0, v, r − v)

(0, v, r − v)

(−v, 0, r)

(−v, 0, r)

( 0, 0, r)

(0, − v, r)

(0, 0, r)

(0, − v, r)

TABLE VI: The columns depict the three distinct possibility ofinput contributing pairs. The first row shows the assignment to variouscomponents of the sharing. The last row, along with three sub-rows,specify the shares held by the three servers.

Lemma B.3 (Communication). Protocol Πjsh (Fig. 16) is non-interactive in the preprocessing phase and requires 1 roundand an amortized communication of ` bits in the online phase.

Proof: In this protocol, servers execute Πjmp protocolonce. Hence the overall cost follows from that of an instanceof the Πjmp protocol (Lemma B.1).

D. Multiplication Protocol

Lemma B.4 (Communication). Protocol Πmult (Fig. 4) re-quires an amortized cost of 3` bits in the preprocessing phase,and 1 round and amortized cost of 3` bits in the online phase.

16

Page 17: SWIFT: Super-fast and Robust Privacy-Preserving Machine ...Nishat Koti , Mahak Pancholi , Arpita Patra , Ajith Suresh Department of Computer Science and Automation, Indian Institute

Proof: In the preprocessing phase, generation of αz andγz are non-interactive. This is followed by one execution ofΠmulPre, which requires an amortized communication cost of3` bits. During the online phase, P0, P1 jmp-send [β?z ]1 toP2, while P0, P2 jmp-send [β?z ]2 to P1. This requires oneround and a communication of 2` bits. Following this, P1, P2

jmp-send βz + γz to P0, which requires one round and acommunication of ` bits. However, jmp-send of βz + γz canbe delayed till the end of the protocol, and will require onlyone round for the entire circuit and can be amortized.

The ideal functionality for ΠmulPre appears in Fig.17.

FMulPre interacts with the servers in P and the adversary S. FMulPre

receives 〈·〉-shares of d, e from the servers where Ps, for s ∈{0, 1, 2}, holds 〈d〉s = (ds, d(s+1)%3) and 〈e〉s = (es, e(s+1)%3)such that d = d0 + d1 + d2 and e = e0 + e1 + e2. Let Pi denotesthe server corrupted by S. FMulPre receives 〈f〉i = (fi, f(i+1)%3)from S where f = de. FMulPre proceeds as follows:– Reconstructs d, e using the shares received from honest servers

and compute f = de.– Compute f(i+2)%3 = f− fi− f(i+1)%3 and set the output shares

as 〈f〉0 = (f0, f1), 〈f〉1 = (f1, f2), 〈f〉2 = (f2, f0).– Send (Output, 〈f〉s) to server Ps ∈ P .

Functionality FMulPre

Fig. 17: 3PC: Ideal functionality for ΠmulPre protocol

E. Reconstruction Protocol

Lemma B.5 (Communication). Protocol Πrec (Fig. 5) requires1 round and a communication of 6` bits in the online phase.

Proof: The preprocessing phase consists of communi-cation of commitment values using the Πjmp protocol. Thehash-based commitment scheme allows generation of a singlecommitment for several values and hence the cost gets amor-tised away for multiple instances. During the online phase,each server receives an opening for the commitment fromother two servers, which requires one round and an overallcommunication of 6` bits.

F. Special protocols

Here we provide details regarding the special protocols -i) Bit Extraction, ii) Bit2A, and iii) Bit Injection.

1) Bit Extraction protocol: Protocol Πbitext allows serversto compute the boolean sharing of the most significant bit(msb) of a value v given its arithmetic sharing JvK. To computethe msb, we use the optimized 2-input Parallel Prefix Adder(PPA) boolean circuit proposed by ABY3 [4]. The PPA circuitconsists of 2` − 2 AND gates and has a multiplicative depthof log `. Let v0 = βv, v1 = − [αv]1 and v2 = − [αv]2.

P0 P1 P2

Jv0[i]KB (0, 0, 0) (0, v0[i], v0[i]) (0, v0[i], v0[i])Jv1[i]KB (v1[i], 0, 0) (v1[i], 0, 0) (0, 0, 0)Jv2[i]KB (0, v2[i], 0) (0, 0, 0) (0, v2[i], 0)

TABLE VII: The J·KB-sharing corresponding to ith bit of v0 =βv, v1 = − [αv]1 and v2 = − [αv]2. Here i ∈ {0, . . . , `− 1}.

Then v = v0 + v1 + v2. Servers first locally compute the

boolean shares corresponding to each bit of the values v0, v1and v2 according to Table VII. It has been shown in ABY3that v = v0+v1+v2 can also be expressed as v = 2c+s whereFA(v0[i], v1[i], v2[i]) → (c[i], s[i]) for i ∈ {0, . . . , ` − 1}.Here FA denotes a Full Adder circuit while s and c denotethe sum and carry bits respectively. To summarize, serversexecute ` instances of FA in parallel to compute JcKB andJsKB. The FA’s are executed independently and require oneround of communication. The final result is then computed asmsb(2JcKB + JsKB) using the optimized PPA circuit.

Lemma B.6 (Communication). Protocol Πbitext requires acommunication cost of 9`− 6 bits in the preprocessing phaseand require log `+1 rounds and an amortized communicationof 9`− 6 bits in the online phase.

Proof: In Πbitext, first round comprises of ` FullAdder (FA) circuits executing in parallel, each comprising ofsingle AND gate. This is followed by the execution of theoptimized PPA circuit of ABY3 [4], which comprises of 2`−2AND gates and has a multiplicative depth of log `. Hence thecommunication cost follows from the multiplication for 3`−2AND gates.

2) Bit2A Conversion protocol: Given the boolean sharingof a bit b, denoted as JbKB, protocol Πbit2A (Fig. 18) al-lows servers to compute the arithmetic sharing JbRK. HerebR denotes the equivalent value of b over ring Z2` (seeNotation II.2). As pointed out in BLAZE, bR = (βb ⊕αb)

R = βRb + αR

b − 2βRb α

Rb . Also αR

b = ([αb]1 ⊕ [αb]2)R =

[αb]R1 + [αb]

R2 − 2 [αb]

R1 [αb]

R2 . During the preprocessing phase,

P0, Pj for j ∈ {1, 2} execute Πjsh on [αb]Rj to generate

J[αb]Rj K. Servers then execute Πmult on J[αb]

R1 K and J[αb]

R2 K

to generate J[αb]R1 [αb]

R2 K followed by locally computing JαR

b K.During the online phase, P1, P2 execute Πjsh on βR

b to jointlygenerate JβR

b K. Servers then execute Πmult protocol on JβRb K

and JαRb K to compute JβR

b αRb K followed by locally computing

bR. The formal details for Πbit2A protocol appears in Fig.18.

Preprocessing:

– P0, Pj for j ∈ {1, 2} execute Πjsh on [αb]Rj to generate J[αb]

Rj K.

– Servers execute Πmult(P, [αb]R1 , [αb]

R2) to generate JuK where

u = [αb]R1 [αb]

R2 , followed by locally computing JαR

b K =J[αb]

R1K + J[αb]

R2K− 2JuK.

– Servers in P execute the preprocessing phase ofΠmult(P, βR

b , αRb ) where v = βR

b αRb .

Online:

– P1, P2 execute Πjsh(P1, P2, βRb ) to generate JβR

b K.– Servers execute online phase of Πmult(P, βR

b , αRb ) to generate

JvK where v = βRb α

Rb , followed by locally computing JbRK =

JβRb K + JαR

b K− 2JvK.

Protocol Πbit2A(P, JbKB)

Fig. 18: 3PC: Bit2A Protocol

Lemma B.7 (Communication). Protocol Πbit2A (Fig. 18)requires an amortized communication cost of 9` bits in thepreprocessing phase and requires 1 round and an amortizedcommunication of 4` bits in the online phase.

17

Page 18: SWIFT: Super-fast and Robust Privacy-Preserving Machine ...Nishat Koti , Mahak Pancholi , Arpita Patra , Ajith Suresh Department of Computer Science and Automation, Indian Institute

Proof: In the preprocessing phase, servers run two in-stances of Πjsh, which can be done non-interactively (ref.§ B-C). This is followed by an execution of entire multipli-cation protocol, which requires 6` bits to be communicated(Lemma B.4). Parallelly, the servers execute the preprocessingphase of Πmult, resulting in an additional 3` bits of com-munication (Lemma B.4). During the online phase, P1, P2

execute Πjsh once, which requires one round and ` bits tobe communicated. In Πjsh, the communication towards P0

can be deferred till the end, thereby requiring a single roundfor multiple instances. This is followed by an execution ofthe online phase of Πmult, which requires one round and acommunication of 3` bits.

3) Bit Injection protocol: Given the binary sharing of a bitb, denoted as JbKB, and the arithmetic sharing of v ∈ Z2` ,protocol ΠBitInj computes J·K-sharing of bv. Towards this,servers first execute Πbit2A on JbKB to generate JbK. Thisis followed by servers computing JbvK by executing Πmult

protocol on JbK and JvK.

Lemma B.8 (Communication). Protocol ΠBitInj requires anamortized communication cost of 12` bits in the preprocessingphase and requires 2 rounds and an amortized communicationof 7` bits in the online phase.

Proof: Protocol ΠBitInj is essentially an execution ofΠbit2A (Lemma B.6) followed by one invocation of Πmult

(Lemma B.4) and the costs follow.

G. Dot Product Protocol

The ideal world functionality for realizing ΠdotpPre ispresented in Fig. 19.

FDotPPre interacts with the servers in P and the adversary S.FDotPPre receives 〈·〉-shares of vectors ~d = (d1, . . . , dn),~e =(e1, . . . , en) from the servers. Server Ps, for s ∈ {0, 1, 2}, holds〈dj〉s = ((dj)s, (dj)(s+1)%3) and 〈ej〉s = ((ej)s, (ej)(s+1)%3)such that dj = (dj)0+(dj)1+(dj)2 and ej = (ej)0+(ej)1+(ej)2

where j ∈ [n]. Let Pi denotes the server corrupted by S. FMulPre

receives 〈f〉i = (fi, f(i+1)%3) from S where f = ~d � ~e. FDotPPre

proceeds as follows:– Reconstructs dj , ej , for j ∈ [n], using the shares received from

honest servers and compute f =∑n

j=1 djej .– Compute f(i+2)%3 = f− fi− f(i+1)%3 and set the output shares

as 〈f〉0 = (f0, f1), 〈f〉1 = (f1, f2), 〈f〉2 = (f2, f0).– Send (Output, 〈f〉s) to server Ps ∈ P .

Functionality FDotPPre

Fig. 19: 3PC: Ideal functionality for ΠdotpPre protocol

Lemma B.9 (Communication). Protocol Πdotp (Fig. 7) re-quires an amortized communication of 3n` bits in the pre-processing phase and requires 1 round and an amortizedcommunication of 3` bits in the online phase, where n denotesthe size of the underlying vectors.

Proof: During the preprocessing phase, servers executethe preprocessing phase of Πmult corresponding to each of then multiplications in parallel, resulting in a communication of3n` bits (Lemma B.4). The online phase follows similarly tothat of Πmult, the only difference being that servers combinetheir shares corresponding to all the n multiplications into one

and then exchange. This requires one round and an amortizedcommunication of 3` bits.

H. Truncation

Lemma B.10 (Communication). Protocol Πtrgen (Fig. 8) re-quires 2 rounds and an amortized communication of 2` bits.

Proof: In Πtrgen, the additive shares of r are sam-pled non-interactively. P0 then executes Πsh protocol on rd,which requires two rounds and a communication of 2` bits(Lemma B.2). P0, P1 then jmp-send the hash of u, followed bya broadcast from P2. Note that the cost of broadcast, and Πjmp

(as it involves sending a hash), gets amortized over multipleinstances.

I. Dot Product with Truncation

The formal details for Πdotpt protocol appears in Fig.20.

Preprocessing:

– Servers execute the preprocessing of Πdotp(P, {JxiK, JyiK}i∈[n]).– In parallel, servers execute Πtrgen(P) to generate the truncation

pair ([r] , JrdK). Also, P0 obtains both the values [r]1 and [r]2.

Online:

– P0, Pj , for j ∈ {1, 2}, compute [Ψ]j = −∑n

i=1((βxi +γxi) [αyi ]j + (βyi + γyi) [αxi ]j) − [r]j and set [(z− r)?]j =[Ψ]j + [χ]j .

– P1, P0 jmp-send [(z− r)?]1 to P2 and P2, P0 jmp-send[(z− r)?]2 to P1.

– P1, P2 locally compute (z− r)? = [(z− r)?]1 + [(z− r)?]2 andset (z− r) = (z− r)? +

∑ni=1(βxiβyi) + ψ.

– P1, P2 locally truncate (z − r) to obtain (z− r)d and executeΠjsh(P1, P2, (z− r)d) to generate J(z− r)dK.

– Servers locally compute JzK = J(z− r)dK + JrdK .

Protocol Πdotpt(P, {JxiK, JyiK}i∈[n])

Fig. 20: 3PC: Dot Product Protocol with Truncation

Lemma B.11 (Communication). Protocol Πdotpt (Fig. 20)requires an amortized communication of 3n` + 2` bits in thepreprocessing phase and requires 1 round and an amortizedcommunication of 3` bits in the online phase.

Proof: During the preprocessing phase, servers executethe preprocessing phase of Πdotp, resulting in a communicationof 3n` bits (Lemma B.9). In parallel, servers execute oneinstance of Πtrgen protocol resulting in an additional commu-nication of 2` bits (Lemma B.10).

The online phase follows from that of Πdotp protocolexcept that, now, P1, P2 compute additive shares of z − r,where z = ~x � ~y, which is achieved using two executionsof Πjmp in parallel. This requires one round and an amortizedcommunication cost of 2` bits. P1, P2 then jointly share thetruncated value of z−r with P0, which requires one round and `bits. However, this step can be deferred till the end for multipledot product with truncation instances, which amortizes thecost.

18

Page 19: SWIFT: Super-fast and Robust Privacy-Preserving Machine ...Nishat Koti , Mahak Pancholi , Arpita Patra , Ajith Suresh Department of Computer Science and Automation, Indian Institute

J. Activation Functions

Lemma B.12 (Communication). Protocol relu requires anamortized communication of 21` − 6 bits in the preprocess-ing phase and requires log ` + 4 rounds and an amortizedcommunication of 16`− 6 bits in the online phase.

Proof: One instance of relu protocol comprises of execu-tion of one instance of Πbitext, followed by ΠBitInj. The cost,therefore, follows from Lemma B.6, and Lemma B-F3.

The formal details of the MPC-friendly variant of theSigmoid function [1], [4], [6] is given below:

sig(v) =

0 v < − 12

v + 12 − 1

2 ≤ v ≤ 12

1 v > 12

Lemma B.13 (Communication). Protocol sig requires anamortized communication of 39` − 9 bits in the preprocess-ing phase and requires log ` + 4 rounds and an amortizedcommunication of 29`− 9 bits in the online phase.

Proof: An instance of sig protocol involves the executionof the following protocols in order– i) two parallel instancesof Πbitext protocol, ii) once instance of Πmult protocol overboolean value, and iii) one instance of ΠBitInj and Πbit2A inparallel. The cost follows from Lemma B.6, Lemma B.7 andLemma B.8.

APPENDIX C4PC PROTOCOLS

In this section, we give the formal details for the 4PCprotocols along with communication cost analysis.

A. 4PC Joint Message Passing Primitive

The ideal functionality for jmp4 primitive appears in Fig.21.

Fjmp4 interacts with the servers in P and the adversary S.Step 1: Fjmp receives (Input, vs) from senders Ps for s ∈ {i, j},(Input,⊥) from receiver Pk and fourth server Pl, while itreceives (Select, ttp) from S. Here ttp is a boolean value, witha 1 indicating that TTP = Pl should be established.

Step 2: If vi = vj and ttp = 0, or if S has corrupted Pl, setmsgi = msgj = msgl = ⊥,msgk = vi and go to Step 4.

Step 3: Else : Set msgi = msgj = msgk = msgl = Pl.Step 4: Send (Output,msgs) to Ps for s ∈ {0, 1, 2, 3}.

Functionality Fjmp4

Fig. 21: 4PC: Ideal functionality for jmp4 primitive

Lemma C.1 (Communication). Protocol Πjmp4 (Fig. 9) re-quires 1 round and an amortized communication of ` bits inthe online phase.

Proof: Server Pi sends the value v to Pk while Pj sendshash of the same to Pk. This accounts for one round ofcommunication. Values sent by Pj for several instances can beconcatenated and hashed to obtain a single value. Hence thecost of sending the hash gets amortized over multiple instances.Similarly, the two round exchange of inconsistency bits, alongwith the two round signalling to the TTP can be combined

for multiple instances, thereby amortizing this cost. Thus, theamortized cost of this protocol is ` bits.

B. Sharing Protocol

Lemma C.2 (Communication). In the online phase, Πsh4 (Fig.10) requires 2 rounds and an amortized communication of 2`bits when P0, P1, P2 share a value, whereas it requires anamortized communication of 3` bits when P3 shares a value.

Proof: The proof for P0, P1, P2 sharing a value followsfrom B.2. For the case when P3 wants to share a value v,it first sends βv + γv to P0 which requires one round and `bits of communication. This is followed by 2 parallel callsto Πjmp4 which together require one round and an amortizedcommunication of 2` bits.

C. Joint Sharing Protocol

Protocol Πjsh4 enables a pair of (unordered) servers(Pi, Pj) to jointly generate a J·K-sharing of value v ∈ Z2`

known to both of them. In case of an inconsistency, the serveroutside the computation serves as a TTP. The protocol isdescribed in Fig. 22.

Preprocessing:

– If (Pi, Pj) = (P1, P2) : P1, P2, P3 sample γv ∈ Z2` . Serverslocally set [αv]1 = [αv]2 = 0.

– If (Pi, Pj) = (Ps, P0), for s ∈ {1, 2} : Servers execute thepreprocessing of Πsh4(Ps, v). Servers locally set γv = 0.

– If (Pi, Pj) = (Ps, P3), for s ∈ {0, 1, 2} : Servers execute thepreprocessing of Πsh4(Ps, v).

Online:

– If (Pi, Pj) = (P1, P2) : P1, P2 set βv = v and jmp4-sendβv + γv to P0.

– If (Pi, Pj) = (Ps, P0), for s ∈ {1, 2, 3} : Ps, P0 computeβv = v + [αv]1 + [αv]2 and jmp4-send βv to Pk, where (k ∈{1, 2}) ∧ (k 6= s).

– If (Pi, Pj) = (Ps, P3), for s ∈ {1, 2}: P3, Ps compute βv andβv + γv. Ps, P3 jmp4-send βv to Pk, where (k ∈ {1, 2})∧ (k 6=s). In parallel, Ps, P3 jmp4-send βv + γv to P0.

Protocol Πjsh4(Pi, Pj , v)

Fig. 22: 4PC: J·K-sharing of a value v ∈ Z2` jointly by Pi, Pj

When P3, P0 want to jointly share a value v which isavailable in the preprocessing phase, protocol Πjsh4 can beperformed with a single element of communication (as op-posed to 2 elements in Fig. 22). P0, P3 can jointly share v asfollows. P0, P3, P1 sample a random r ∈ Z2` and set [αv]1 = r.P0, P3 set [αv]2 = −(r + v) and jmp4-send [αv]2 to P2. Thisis followed by servers locally setting γv = βv = 0.

We further observe that servers can generate a J·K-sharingof v non-interactively when v is available with P0, P1, P2. Todo this, servers set [αv]1 = [αv]2 = γv = 0 and βv = v.We abuse notation and use Πjsh4(P0, P1, P2, v) to denote thissharing.

Lemma C.3 (Communication). In the online phase, Πjsh4 (Fig.22) requires 1 round and an amortized communication of 2`bits when (P3, Ps) for s ∈ {0, 1, 2} share a value, and requiresan amortized communication of ` bits, otherwise.

19

Page 20: SWIFT: Super-fast and Robust Privacy-Preserving Machine ...Nishat Koti , Mahak Pancholi , Arpita Patra , Ajith Suresh Department of Computer Science and Automation, Indian Institute

Proof: When (P3, Ps) for s ∈ {0, 1, 2} want to share avalue v, there are two parallel calls to Πjmp4 which requires anamortized communication of 2` bits and one round. In the othercases, Πjmp4 is invoked only once, resulting in an amortizedcommunication of ` bits.

D. 〈·〉-sharing Protocol

In some protocols, scenarios arise where P3 is required togenerate 〈·〉-sharing of a value v in the preprocessing phase,where 〈·〉-sharing of v is same as that defined in 3PC (wherev = v0 + v1 + v2, and P0 possesses (v0, v1), P1 possesses(v1, v2), and P2 possess (v2, v0)) with the addition that P3 nowpossesses (v0, v1, v2). We call the resultant protocol Πash4 andit appears in Fig. 23.

Preprocessing :

– Servers P0, P3, P1 sample a random v1 ∈ Z2` , while serversP0, P3, P2 sample a random v0 ∈ Z2` .

– P3 computes v2 = v − v0 − v1 and sends v2 to P2. P3, P2

jmp4-send v2 to P1.

Protocol Πash4(P3, v)

Fig. 23: 4PC: 〈·〉-sharing of value v by P3

Note that servers can locally convert 〈v〉 to JvK by settingtheir shares as shown in Table VIII.

P0 P1 P2 P3

JvK (−v1,−v0, 0) (−v1, v2,−v2) (−v0, v2,−v2) (−v0,−v1,−v2)

TABLE VIII: Local conversion of shares from 〈·〉-sharing to J·K-sharing for a value v. Here, [αv]1 = −v1, [αv]2 = −v0, βv = v2, γv =−v2.

Lemma C.4 (Communication). Protocol Πash4 (Fig. 23) re-quires 2 rounds and an amortized communication of 2` bits.

Proof: Communicating v2 to P2 requires ` bits and 1round. This is followed by one invocation of Πjmp4 which re-quires ` bits and 1 round. Thus, the amortized communicationcost is 2` bits and two rounds.

E. Multiplication Protocol

Lemma C.5 (Communication). Πmult4 (Fig. 11) requires anamortized communication of 3` bits in the preprocessing phase,and 1 round with an amortized communication of 3` bits inthe online phase.

Proof: In the preprocessing phase, the servers executeΠjmp4 to jmp4-send [Γxy]2 to P2 resulting in amortized com-munication of ` bits. This is followed by 2 parallel invocationsof Πjmp4 to jmp4-send [χ]1 , [χ]2 to P0 which require anamortized communication of 2` bits. Thus, the amortizedcommunication cost in preprocessing is 3` bits. In the onlinephase, there are 2 parallel invocations of Πjmp4 to jmp4-send[β?z ]1 , [β

?z ]2 to P2, P1, respectively, which requires amortized

communication of 2` bits and one round. This is followedby another call to Πjmp4 to jmp4-send βz + γz to P0 whichrequires one more round and amortized communication of `bits. However, jmp4-send of βz+γz can be delayed till the end

of the protocol, and will require only one round for multiplemultiplication gates and hence, can be amortized. Thus, thetotal number of rounds required for multiplication in the onlinephase is one with an amortized communication of 3` bits.

F. Reconstruction Protocol

Lemma C.6 (Communication). Πrec4 (Fig. 12) requires anamortized communication of 8` bits and 1 round in the onlinephase.

Proof: Each Ps for s ∈ {0, 1, 2, 3} receives the missingshare in clear from two other servers, while the hash of itfrom the third. As before, the missing share sent by the thirdserver can be concatenated over multiple instances and hashedto obtain a single value. Thus, the amortized communicationcost is 2` bits per server, resulting in a total cost of 8` bits.

G. Special protocols

Here we provide details regarding the special protocols -i) Bit Extraction, ii) Bit2A, and iii) Bit Injection.

1) Bit Extraction Protocol: This protocol enables theservers to compute a boolean sharing of the most significantbit (MSB) of a value v ∈ Z2` given the arithmetic sharing JvK.To compute the MSB, we use the optimized Parallel PrefixAdder (PPA) circuit from ABY3 [4], which takes as inputtwo boolean values and outputs the MSB of the sum of theinputs. The circuit requires 2(` − 1) AND gates and has amultiplicative depth of log `. The protocol for bit extraction(Πbitext4) involves computing the boolean PPA circuit usingthe protocols described in §IV. The two inputs to this booleancircuit are generated as follows. The value v whose MSB needsto be extracted can be represented as the sum of two valuesas v = βv + (−αv) where the first input to the circuit willbe βv and the second input will be −αv. Since βv is heldby P1, P2, servers execute Πjsh4 to generate JβvKB. Similarly,P0, P3 possess αv, and servers execute Πjsh4 to generateJ−αvKB. Servers in P use the J·KB-shares of these two inputs(βv,−αv) to compute the optimized PPA circuit which outputsthe Jmsb(v)KB.

Lemma C.7 (Communication). The protocol Πbitext4 requiresan amortized communication of 7`−6 bits in the preprocessingphase, and log ` rounds with amortized communication of 7`−6 bits in the online phase.

Proof: Generation of boolean sharing of αv requires `bits in the preprocessing phase (since Πjsh4 with P0, P3 canbe achieved with ` bits of communication in the preprocessingphase), and generation of boolean sharing of βv requires `bits and one round (which can be deferred towards the end ofthe protocol thereby requiring one round for several instances)in the online phase. Further, the boolean PPA circuit to becomputed requires 2(` − 1) AND gates. Since each ANDgate requires Πmult4 to be executed, it requires an amortizedcommunication of 6`− 6 bits in both the preprocessing phaseand the online phase. Thus, the overall communication is 7`−6bits, in both, the preprocessing and online phase. The circuithas a multiplicative depth of log ` which results in log ` roundsin the online phase.

20

Page 21: SWIFT: Super-fast and Robust Privacy-Preserving Machine ...Nishat Koti , Mahak Pancholi , Arpita Patra , Ajith Suresh Department of Computer Science and Automation, Indian Institute

2) Bit2A Protocol: This protocol enables servers to com-pute the arithmetic sharing of a bit b given its boolean sharing.Let bR denote the value of b in the ring Z2` . We observe that bR

can be written as follows. bR = (αb⊕βb)R = αRb +βR

b−2αRbβ

Rb .

Thus, to obtain an arithmetic sharing of bR, the servers cancompute an arithmetic sharing of βR

b , αRb and βR

b αRb . This

can be done as follows. P0, P3 execute Πjsh4 on αRb in

the preprocessing phase to generate JαRb K. Similarly, P1, P2

execute Πjsh4 on βRb in the online phase to generate JβR

b K.This is followed by Πmult4 on JβR

b K, JαRb K, followed by local

computation to obtain JbRK.

Preprocessing :

– Servers execute Πash4(P3, eR) (Fig. 23) where e = αb ⊕ γb.

Let the shares be 〈eR〉0 = (e0, e1), 〈eR〉1 = (e1, e2), 〈eR〉2 =(e2, e0), 〈eR〉3 = (e0, e1, e2).

– Verification of 〈eR〉-sharing is performed as follows:– P1, P2, P3 sample a random r ∈ Z2` and a bit rb ∈ Z21 .– P1, P2 compute x1 = γb ⊕ rb, and jmp4-send x1 to P0.– P1, P3 compute y1 = (e1 + e2)(1− 2rRb ) + rRb + r, and

jmp4-send y1 to P0.– P2, P3 compute y2 = e0(1− 2rRb )− r, and jmp4-send H(y2)

to P0.– P0 computes x = e⊕ rb = [αb]1 ⊕ [αb]2 ⊕ x1 and checks if

H(xR − y1) = H(y2).– If verification fails, P0 sets flag = 1, else it sets flag = 0.P0 sends flag to P1. Next, P1, P0 jmp4-send flag to P2 andP3. Servers set TTP = P1 if flag = 1.

– If verification succeeds, servers locally convert 〈eR〉 to JeRK bysetting their shares according to Table VIII.

Online :

– Servers execute Πjsh4(P0, P1, P2, cR) where c = βb ⊕ γb.

– Servers execute Πmult4(P, JeRK, JcRK) to generate JeRcRK.– Servers compute JbRK = JeRK + JcRK− 2JeRcRK.

Protocol Πbit2A4(P, JbKB)

Fig. 24: 4PC: Bit2A Protocol

While the above approach serves the purpose, we nowprovide an improved version, which further helps in reducingthe online cost. We observe that bR can be written as follows.bR = (αb⊕βb)R = ((αb⊕γb)⊕(βb⊕γb))R = (e⊕c)R = eR+cR−2eRcR where e = αb⊕γb and c = βb⊕γb. Thus, to obtainan arithmetic sharing of bR, P3 generates 〈·〉-sharing of eR. Toensure the correctness of the shares, the servers P0, P1, P2

check whether the following equation holds: (e ⊕ rb)R =

eR + rRb − 2eRrRb . If the verification fails, a TTP is identified.Else, this is followed by servers locally converting 〈eR〉-sharesto JeRK according to Table VIII, followed by multiplyingJeRK, JcRK and locally computing JbRK = JeRK+JcRK−2JeRcRK.Note that during Πjsh4(P0, P1, P2, c

R) since αcR and γcR areset to 0, the preprocessing of multiplication can be performedlocally.

Lemma C.8 (Communication). Πbit2A4 (Fig. 24) requires anamortized communication of 3` + 4 bits in the preprocessingphase, and 1 round with amortized communication of 3` bitsin the online phase.

Proof: During preprocessing, one instance of Πash4 re-quires 2` bits of communication. Further, sending x1 requires

1 bit, while sending y1 requires ` bits. Sending of H(y2) can beamortized over several instances. Finally, communicating flagrequires 3 bits. Thus, the overall amortized communicationcost in preprocessing phase is 3` + 4 bits. In the onlinephase, joint sharing of cR can be performed non-interactively.The only cost is due to the online phase of multiplicationwhich requires 3` bits and one round. Thus, the amortizedcommunication cost in the online phase is 3` bits with oneround of communication.

3) Bit Injection Protocol: Given the boolean sharing of abit b, denoted as JbKB, and the arithmetic sharing of v ∈ Z2` ,protocol Πbitinj4 computes J·K-sharing of bv. This can benaively computed by servers first executing Πbit2A4 on JbKB togenerate JbK, followed by servers computing JbvK by executingΠmult4 protocol on JbK and JvK. Instead, we provide an opti-mized variant which helps in reducing the communication costof the naive approach in, both, the preprocessing and onlinephase. We give the details next.

Let z = bRv, where bR denotes the value of b in Z2` . Then,during the computation of JzK, we observe the following:

z = bRv = (αb ⊕ βb)R(βv − αv)

= ((αb ⊕ γb)⊕ (βb ⊕ γb))R((βv + γv)− (αv + γv))

= (cb ⊕ eb)R(cv − ev) = (cRb + eRb − 2cRb eRb )(cv − ev)

= cRb cv − cRb ev + (cv − 2cRb cv)eRb + (2cRb − 1)eRb ev

where cb = βb ⊕ γb, eb = αb ⊕ γb, cv = βv + γv and ev =αv + γv. The protocol proceeds with P3 generating 〈·〉-sharesof eRb and ez = eRb ev, followed by verification of the sameby P0, P1, P2. If verification succeeds, then to enable P2 tocompute βz = z + αz, P1, P0 jmp4-send the missing share ofβz to P2. Similarly for P1. Next, P1, P2 reconstruct βz, andjmp4-send βz + γz to P0 completing the protocol.

Lemma C.9 (Communication). Protocol Πbitinj4 requires anamortized communication cost of 6` + 4 bits in the pre-processing phase, and requires 1 round with an amortizedcommunication of 3` bits in the online phase.

Proof: The preprocessing phase requires two instancesof Πash4 which require 4` bits of communication. Verifyingcorrectness of 〈eRb 〉 requires ` + 1 bits, whereas for 〈ez〉 werequire ` bits. Finally, communicating the flag requires 3bits. This results in the amortized communication of 6` + 4bits in the preprocessing phase. The online phase consistsof three calls to Πjmp4 which requires 3` bits of amortizedcommunication. Note that the last call can be deferred towardsthe end of the computation, thereby requiring a single roundfor multiple instances. Thus, the number of rounds required inthe online phase is one.

Let cb = βb ⊕ γb, eb = αb ⊕ γb, cv = βv + γv, ev = αv + γvand ez = eRb ev.

Preprocessing :

– P0, P3, Pj for j ∈ {1, 2} sample [αz]1 ∈ Z2` while P1, P2, P3

sample γz ∈ Z2` .– Servers execute Πash4(P3, e

Rb ) and Πash4(P3, ez). Shares of 〈ev〉

are set locally as ev0 = [αv]2 , ev1 = [αv]1 , ev3 = γv.

Protocol Πbitinj4(P, JbKB, JvK)

21

Page 22: SWIFT: Super-fast and Robust Privacy-Preserving Machine ...Nishat Koti , Mahak Pancholi , Arpita Patra , Ajith Suresh Department of Computer Science and Automation, Indian Institute

– Servers verify correctness of 〈eRb 〉 using steps similar to Πbit2A4

(Fig. 24). Correctness of 〈ez〉 is verified as follows.– P0, P3, Pj for j ∈ {1, 2} sample a random rj ∈ Z2` whileP1, P2, P3 sample a random r0 ∈ Z2` . P0, P3 set a0 = r1−r2,P1, P3 set a1 = r0 − r1 and P2, P3 set a2 = r2 − r0.

– P1, P3 compute x1 = ev2eb2 + ev1eb2 + ev2eb1 + a1.– P2, P3 compute x2 = ev0eb0 + ev0eb2 + ev2eb0 + a2.– P0 computes x0 = ev1eb1 + ev1eb0 + ev0eb1 + a0.– P1, P3 jmp4-send y1 = x1 − ez1 to P0, while P2, P3

jmp4-send H(−y2) to P0, where y2 = x2 − ez2 .– P0 computes y0 = x0 − ez0 , and checks if H(y0 + y1) =

H(−y2).– If verification fails, P0 sets flag = 1, else it sets flag = 0. P0

sends flag to P1. Next, P1, P0 jmp4-send flag to P2 and P3.Servers set TTP = P1 if flag = 1.

Online :

– P0, P1 compute u1 = −cRb ev1 +(cv−2cRb cv)eRb1 +(2cRb−1)ez1 +[αz]1, and jmp4-send u1 to P2.

– P0, P2 compute u2 = −cRb ev0 +(cv−2cRb cv)eRb0 +(2cRb−1)ez0 +[αz]2, and jmp4-send u2 to P1.

– P1, P2 compute βz = u1 + u2 − cRb ev2 + (cv − 2cRb cv)eRb2 +(2cRb − 1)ez2 + cRb cv.

– P1, P2 jmp4-send βz + γz to P0.

Fig. 25: 4PC: Bit Injection Protocol

H. Dot Product Protocol

The formal protocol for dot product is given in Fig. 26.

Preprocessing :

– P0, P3, Pj , for j ∈ {1, 2}, sample random [αz]j ∈ Z2` , whileP0, P1, P3 sample random [Γ~x�~y]1 ∈ Z2` .

– P1, P2, P3 together sample random γz, ψ, r ∈ Z2` and set[ψ]1 = r, [ψ]2 = ψ − r.

– P0, P3 compute [Γ~x�~y]2 = Γ~x�~y − [Γ~x�~y]1, where Γ~x�~y =∑ni=1 αxiαyi . P0, P3 jmp4-send [Γ~x�~y]2 to P2.

– P3, Pj , for j ∈ {1, 2}, set [χ]j =∑n

i=1(γxi [αyi ]j +γyi [αxi ]j)+ [Γ~x�~y]j − [ψ]j .

– P1, P3 jmp4-send [χ]1 to P0, and P2, P3 jmp4-send [χ]2 to P0.

Online :

– P0, Pj , for j ∈ {1, 2}, compute [β?z ]j = −

∑ni=1((βxi +

γxi) [αyi ]j + (βyi + γyi) [αxi ]j) + [αz]j + [χ]j .– P1, P0 jmp4-send [β?

z ]1 to P2, while P2, P0 jmp4-send [β?z ]2

to P1.– Pj for j ∈ {1, 2} computes β?

z = [β?z ]1 + [β?

z ]2 and sets βz =β?z +

∑ni=1(βxiβyi) + ψ.

– P1, P2 jmp4-send βz + γz to P0.

Protocol Πdotp4(P, {JxiK, JyiK}i∈[n])

Fig. 26: 4PC: Dot Product Protocol (z = ~x� ~y)

Lemma C.10 (Communication). Πdotp4 (Fig. 26) requires anamortized communication of 3` bits in the preprocessing phase,and 1 round and an amortized communication of 3` bits in theonline phase.

Proof: The preprocessing phase requires three calls toΠjmp4, one to jmp4-send [Γ~x�~y]

2to P2, and two to jmp4-send

[χ]1 , [χ]2 to P0. Each invocation of Πjmp4 requires ` bits re-sulting in the amortized communication cost of preprocessing

phase to be 3` bits. In the online phase, there are 2 parallelinvocations of Πjmp4 to jmp4-send [β?z ]1 , [β

?z ]2 to P2, P1,

respectively, which require amortized communication of 2` bitsand one round. This is followed by another call to Πjmp4 tojmp4-send βz + γz to P0 which requires one more round andamortized communication of ` bits. As in the multiplicationprotocol, this step can be delayed till the end of the protocoland clubbed for multiple instances. Thus, the online phaserequires one round and an amortized communication of 3`bits.

I. Truncation

Given the J·K-sharing of a value v, this protocol enablesthe servers to compute the J·K-sharing of the truncated valuevd (right shifted value by, say, d positions). Given JvK and arandom truncation pair ([r] , JrdK), the value (v− r) is opened,truncated and added to JrdK to obtain JvdK. The protocol forgenerating the truncation pair ([r] , JrdK) is described in Fig.27.

– P0, P3, Pj , for j ∈ {1, 2} sample random Rj ∈ Z2` . P0, P3

sets r = R1 +R2 while Pj sets [r]j = Rj .

– P0, P3 locally truncate r to obtain rd and executeΠjsh4(P0, P3, r

d) to generate JrdK.

Protocol Πtrgen4(P)

Fig. 27: 4PC: Generating Random Truncated Pair (r, rd)

Lemma C.11 (Communication). Πtrgen4 (Fig. 27) requires 1round and an amortized communication of ` bits in the onlinephase.

Proof: The cost follows directly from that ofΠjsh4 (Lemma C.1).

J. Dot Product with Truncation

Protocol Πdotpt4 (Fig. 28) enables servers to generate J·K-sharing of the truncated value of z = ~x � ~y, denoted as zd,given the J·K-sharing of n-sized vectors ~x and ~y.

Preprocessing :

– Servers execute the preprocessing phase of Πdotp4(P,{JxiK, JyiK}i∈[n]).

– Servers execute Πtrgen4(P) to generate the truncation pair([r] , JrdK). P0 obtains the value r in clear.

Online :

– P0, Pj , for j ∈ {1, 2}, compute [Ψ]j = −∑n

i=1((βxi +γxi) [αyi ]j + (βyi + γyi) [αxi ]j) − [r]j and sets [(z− r)?]j =[Ψ]j + [χ]j .

– P1, P0 jmp4-send [(z− r)?]1 to P2 and P2, P0 jmp4-send[(z− r)?]2 to P1.

– P1, P2 locally compute (z− r)? = [(z− r)?]1 + [(z− r)?]2 andset (z− r) = (z− r)? +

∑ni=1(βxiβyi) + ψ.

– P1, P2 locally truncate (z − r) to obtain (z− r)d and executeΠjsh4(P1, P2, (z− r)d) to generate J(z− r)dK.

– Servers locally compute JzdK = J(z− r)dK + JrdK .

Protocol Πdotpt4(P, {JxiK, JyiK}i∈[n])

Fig. 28: 4PC: Dot Product Protocol with Truncation

22

Page 23: SWIFT: Super-fast and Robust Privacy-Preserving Machine ...Nishat Koti , Mahak Pancholi , Arpita Patra , Ajith Suresh Department of Computer Science and Automation, Indian Institute

Lemma C.12 (Communication). Πdotpt4 (Fig. 28) requires anamortized communication of 4` bits in the preprocessing phase,and 1 round with amortized communication of 3` bits in theonline phase.

Proof: The preprocessing phase comprises of the pre-processing phase of Πdotp4 and Πtrgen4 which results in anamortized communication of 3`+` = 4` bits. The online phasefollows from that of Πdotp4 protocol except that, now, P1, P2

compute [·]-shares of z − r. This requires one round and anamortized communication cost of 2` bits. P1, P2 then jointlyshare the truncated value of z − r with P0, which requires 1round and ` bits. However, this step can be deferred till the endfor multiple instances, which results in amortizing this round.The total amortized communication is thus 3` bits in onlinephase.

K. Activation Functions

Lemma C.13 (Communication). Protocol for relu requires anamortized communication of 13`− 2 bits in the preprocessingphase and requires log ` + 1 rounds and an amortized com-munication of 10`− 6 bits in the online phase.

Proof: One instance of relu protocol comprises of execu-tion of one instance of Πbitext4, followed by Πbitinj4. The cost,therefore, follows from Lemma C.7, and Lemma C.9.

Lemma C.14 (Communication). Protocol for sig requires anamortized communication of 23`− 1 bits in the preprocessingphase and requires log ` + 2 rounds and an amortized com-munication of 20`− 9 bits in the online phase.

Proof: An instance of sig protocol involves the executionof the following protocols in order– i) two parallel instancesof Πbitext4 protocol, ii) one instance of Πmult4 protocol overboolean value, and iii) one instance of Πbitinj4 and Πbit2A4 inparallel. The cost follows from Lemma C.7, Lemma C.8 andLemma C.9.

APPENDIX DSECURITY ANALYSIS OF OUR PROTOCOLS

In this section, we provide detailed security proofs for ourconstructions in both the 3PC and 4PC domains. We provesecurity using the real-world/ ideal-word simulation basedtechnique. We provide proofs in the Fsetup-hybrid model forthe case of 3PC, where Fsetup (Fig. 14) denotes the ideal func-tionality for the three server shared-key setup. Similarly, theproofs for 4PC are provided in the Fsetup4-hybrid model (Fig.15).

Let A denote the real-world adversary corrupting at mostone server in P , and S denote the corresponding ideal worldadversary. The strategy for simulating the computation offunction f (represented by a circuit ckt) is as follows: Thesimulation begins with the simulator emulating the shared-keysetup (Fsetup/Fsetup4) functionality and giving the respectivekeys to the adversary. This is followed by the input sharingphase in which S extracts the input of A, using the knownkeys, and sets the inputs of the honest parties to be 0. Snow knows all the inputs and can compute all the intermediatevalues for each of the building blocks in clear. Also, S can

obtain the output of the ckt in clear. S now proceeds simulatingeach of the building block in topological order using theaforementioned values (inputs of A, intermediate values andcircuit output).

In some of our sub protocols, adversary is able to decideon which among the honest parties should be chosen as theTrusted Third Party (TTP) in that execution of the protocol. Tocapture this, we consider corruption-aware functionalities [54]for the sub-protocols, where the functionality is provided theidentity of the corrupt server as an auxiliary information.

For modularity, we provide the simulation steps for each ofthe sub-protocols separately. These steps, when carried out inthe respective order, result in the simulation steps for the entire3/4PC protocol. If a TTP is identified during the simulationof any of the sub-protocols, simulator will stop the simulationat that step. In the next round, the simulator receives the inputof the corrupt party in clear on behalf of the TTP for the 3PCcase, whereas it receives the input shares from adversary for4PC.

A. Security Proofs for 3PC protocols

The ideal functionality F3PC for evaluating ckt in the 3PCsetting appears in Fig. 29.

F3PC interacts with the servers in P and the adversary S. Letf denote the functionality to be computed. Let xs be the inputcorresponding to the server Ps, and ys be the corresponding output,i.e (y0, y1, y2) = f(x0, x1, x2).

Step 1: F3PC receives (Input, xs) from Ps ∈ P , and computes(y0, y1, y2) = f(x0, x1, x2).

Step 2: F3PC sends (Output, ys) to Ps ∈ P .

Functionality F3PC

Fig. 29: 3PC: Ideal functionality for evaluating a function f

Now we provide the simulation for each of the sub-protocols.

1) Joint Message Passing (jmp) Protocol: This sectionprovides the security proof for the jmp primitive, which formsthe crux for achieving GOD guarantee in our constructions. LetFjmp (Fig. 1) denote the ideal functionality and let SPs

jmp denotethe corresponding simulator for the case of corrupt Ps ∈ P .

We begin with the case for a corrupt sender, Pi. The casefor a corrupt Pj is similar and hence we omit details for thesame.

– SPijmp initializes ttp = ⊥ and receives vi from A on behalf of

Pk.– In case, A fails to send a value SPi

jmp broadcasts"(accuse,Pi)", sets ttp = Pj , vi = ⊥, and skip to the laststep.

– Else, it checks if vi = v, where v is the value computed bySPijmp based on the interaction with A, and using the knowledge

of the shared keys. If the values are equal, SPijmp sets bk = 0,

else, sets bk = 1, and sends the same to A on the behalf of Pk.– If A broadcasts "(accuse,Pk)", SPi

jmp sets vi = ⊥,ttp = Pj , and skips to the last step.

– SPijmp computes and sends bj to A on behalf of Pj and receives

Simulator SPijmp

23

Page 24: SWIFT: Super-fast and Robust Privacy-Preserving Machine ...Nishat Koti , Mahak Pancholi , Arpita Patra , Ajith Suresh Department of Computer Science and Automation, Indian Institute

bA from A on behalf of honest Pj .– If SPi

jmp does not receive a bA on behalf of Pj , it broadcasts"(accuse,Pi)", sets vi = ⊥, ttp = Pk. If A broadcasts"(accuse,Pj)", SPi

jmp sets vi = ⊥, ttp = Pk. If ttp is set,skip to the last step.

– If (vi = v) and bA = 1, SPijmp broadcasts Hj = H(v) on behalf

of Pj .– Else if vi 6= vj : SPi

jmp broadcasts Hj = H(v) and Hk = H(vi)on behalf of Pj and Pk, respectively. If A does not broadcast,SPijmp sets ttp = Pk. Else if, A broadcasts a value HA:

• If HA 6= Hj : SPijmp sets ttp = Pk.

• Else if HA 6= Hk : SPijmp sets ttp = Pj .

– SPijmp invokes Fjmp on (Input, vi) and (Select, ttp) on behalf of

A.

Fig. 30: Simulator SPijmp4 for corrupt sender Pi

The case for a corrupt receiver, Pk is provided in Fig.31.

– SPkjmp initializes ttp = ⊥, computes v honestly and sends v and

H(v) to A on behalf of Pi and Pj , respectively.– If A broadcasts "(accuse,Pi)", set ttp = Pj , else if A

broadcasts "(accuse,Pj)", set ttp = Pi. If both messagesare broadcast, set ttp = Pi. If ttp is set skip to the last step.

– On behalf of Pi, Pj , SPkjmp receives bA from A. Let bi (resp.

bj) denote the bit received by Pi (resp. Pj) from A.– If A failed to send bit bA to Pi, SPk

jmp broadcasts"(accuse,Pk)", set ttp = Pj . Similarly, for Pj . If bothPi, Pj broadcast "(accuse,Pk)", set ttp = Pi. If ttp is set,skip to the last step.

– If bi ∨ bj = 1 : SPkjmp broadcasts Hi,Hj where

Hi = Hj = H(v) on behalf of Pi, Pj , respectively.– If A does not broadcast SPk

jmp sets ttp = ⊥. If A broadcasts avalue HA:• If HA 6= Hi : SPk

jmp sets ttp = Pj .

• Else if HA = Hi = Hj : SPkjmp sets ttp = Pi.

– SPkjmp invokes Fjmp on (Input,⊥) and (Select, ttp) on behalf of

A.

Simulator SPkjmp

Fig. 31: Simulator SPkjmp for corrupt receiver Pk

2) Sharing Protocol: Here we give he simulation stepsfor Πsh. The case for a corrupt P0 is provided in Fig.37.

Preprocessing: SP0sh emulates Fsetup and gives the keys

(k01, k02, kP) to A. The values that are commonly held along withA are sampled using appropriate shared key. Otherwise, values aresampled randomly.

Online:

– If the dealer Ps = P0:• SP0

sh receives βv on behalf of P1 and sets msg = v accordingly.• Steps for Πjmp protocol are simulated according to SPi

jmp(Fig. 30), where P0 plays the role of one of the senders.

– If the dealer Ps = P1:• SP0

sh sets v = 0 by assigning βv = αv.• Steps for Πjmp protocol are simulated similar to SPk

jmp (Fig. 31),

Simulator SP0sh

with P0 acting as the receiver.– If the dealer if P2 : Similar to the case when Ps = P1.

Fig. 32: Simulator SP0sh for corrupt P0

The case for a corrupt P1 is provided in Fig. 33. The casefor a corrupt P2 is similar.

Preprocessing: SP1jsh emulates Fsetup and gives the keys

(k01, k12, kP) to A. The values that are commonly held along withA are sampled using appropriate shared key. Otherwise, values aresampled randomly.

Online:

– If dealer Ps = P1 : SP1sh receives βv from A on behalf of P2.

– If Ps = P0 : SP1sh sets v = 0 by assigning βv = αv and sends

βv to A on behalf of Ps.– If Ps = P2 : Similar to the case where Ps = P0.– Steps of Πjmp, in all the steps above, are simulated similar toSPijmp (Fig. 30), ie. the case of corrupt sender.

Simulator SP1sh

Fig. 33: Simulator SP1sh for corrupt P1

3) Multiplication Protocol: Here we give he simulationsteps for Πmult. The case for a corrupt P0 is provided in Fig.34.

Preprocessing:

– SP0mult samples [αz]1 , [αz]2 and γz on behalf of P1, P2 and

generates the 〈·〉-shares of d, e honestly.– SP0

mult emulates FMulPre, and extracts ψ, [χ]1 , [χ]2 on behalf ofP1, P2.

Online:

– SP0mult computes [β?

z ]1 , [β?z ]2 and steps of Πjmp are simulated

according to SPijmp with A as one of the sender for both [β?

z ]1,and [β?

z ]2.– SP0

mult computes βz+γz on behalf of P1, P2 and steps of Πjmp aresimulated according to SPk

jmp with A as the receiver for βz + γz.

Simulator SP0mult

Fig. 34: Simulator SP0mult for corrupt P0

The case for a corrupt P1 is provided in Fig. 35. The casefor a corrupt P2 is similar.

Preprocessing:

– SP1mult samples [αz]1 , γz and [αz]2 on behalf of P0, P2. SP1

multgenerates the 〈·〉-shares of d, e honestly.

– SP1mult emulates FMulPre, and extracts ψ, [χ]1 , [χ]2 on behalf of

P0, P2.

Online:

– SP1mult computes [β?

z ]1 , [β?z ]2 on behalf of P0, P2, and steps of

Πjmp are simulated according to SPijmp with A as one of the sender

for [β?z ]1, and as the receiver for [β?

z ]2.– SP1

mult computes βz + γz on behalf of P2 and steps of Πjmp aresimulated according to SPi

jmp with A one of the senders for βz+γz.

Simulator SP1mult

Fig. 35: Simulator SP1mult for corrupt P1

24

Page 25: SWIFT: Super-fast and Robust Privacy-Preserving Machine ...Nishat Koti , Mahak Pancholi , Arpita Patra , Ajith Suresh Department of Computer Science and Automation, Indian Institute

4) Reconstruction Protocol: Here we give he simulationsteps for Πrec. The case for a corrupt P0 is provided in Fig.52. The case for a corrupt P1, P2 is similar.

Preprocessing:

– Srec computes commitments on [αv]1 , [αv]2 and γv on behalf ofP1, P2, using the respective shared keys.

– The steps of Πjmp are simulated similar to SPkjmp with A acting

as the receiver for Com(γv), and SPijmp with A acting as one of

the senders for Com([αv]1) and Com([αv]2).

Online:

– Srec receives openings for Com([αv]1),Com([αv]2) on behalfof P2 and P1, respectively.

– Srec opens Com(γv) to A on behalf of P1, P2.

Simulator Srec

Fig. 36: Simulator Srec for corrupt P0

5) Joint Sharing Protocol: Here we give he simulationsteps for Πjsh. The case for a corrupt P0 is provided in Fig. 37.The case for a corrupt P1, P2 is similar.

Preprocessing: SP0sh emulates Fsetup and gives the keys

(k01, k02, kP) to A. The values that are commonly held along withA are sampled using appropriate shared key. Otherwise, values aresampled randomly.

Online:

– If (Pi, Pj) = (P1, P0) : Sjsh computes βv = v + αv on behalfof P1. The steps of Πjmp are simulated similar to SPi

jmp, where theA acts as one of the senders.

– If (Pi, Pj) = (P2, P0) : Similar to the case when (Pi, Pj) =(P1, P0).

– If (Pi, Pj) = (P1, P2) : Sjsh sets v = 0 by setting βv = αv. Thesteps of Πjmp are simulated similar to SPk

jmp, where the A acts asthe receiver.

Simulator Sjsh

Fig. 37: Simulator Sjsh for corrupt P0

6) Dot Product Protocol: Here we give he simulation stepsfor Πdotp. The case for a corrupt P0 is provided in Fig.38.

Preprocessing: Sdotp emulates FDotPPre and derives ψ and respec-tive [·]-shares of χ honestly on behalf of P1, P2.

Online:

– SP0dotp computes [·]-shares of β?

z on behalf of P1, P2. The stepsof Πjmp, required to provide P1 with [β?

z ]2, and P2 with [β?z ]1,

are simulated similar to SPijmp, where P0 acts as one of the sender

in both cases.– SP0

dotp computes β?z and βz on behalf of P1, P2. The steps of the

Πjmp, required to provide P0 with βz + γz, are simulated similarto SPk

jmp, where P0 acts as the receiver.

Simulator SP0dotp

Fig. 38: Simulator Sdotp for corrupt P0

The case for a corrupt P1 is provided in Fig. 39. The casefor a corrupt P2 is similar.

Preprocessing: SP1dotp emulates FDotPPre and derives [·]-shares of

ψ, χ honestly on behalf of P0, P2.

Online:

– SP1dotp computes [β?

z ]1 on behalf of P0, and [β?z ]2 on behalf of

P0 and P2. The steps of Πjmp, required to provide P1 with [β?z ]2,

and P2 with [β?z ]1, are simulated similar to SPi

jmp, where A actsas one of the sender in the former case, and as a receiver in thelatter case.

– SP1dotp computes β?

z and βz on behalf of P2. The steps of Πjmp,required to provide P0 with βz+γz, are simulated similar to SPi

jmp,where A acts as one of the sender.

Simulator SP1dotp

Fig. 39: Simulator Sdotp for corrupt P1

7) Truncation Protocol: Here we give he simulation stepsfor Πtrgen. The case for a corrupt P0 is provided in Fig.40.

– SP0trgen samples R1, R2 using the respective keys with A.

– Steps corresponding to Πsh are simulated similar to the stepsSP0

Πshfor corrupt P0.

– SP0trgen computes u, and steps corresponding to Πjmp are simulated

similar to SPiΠjmp

.

– SP0trgen computes v. If H(u) 6= H(v), SP0

trgen broadcasts"(accuse,P0)", and sets ttp = P1.

Simulator SP0trgen

Fig. 40: Simulator SP0trgen for corrupt P0

The case for a corrupt P1 is provided in Fig. 41. The casefor a corrupt P2 is similar.

– SP1trgen samples R1 using the key k01 with A, and samples

random R2. SP1trgen sets r = R1 + R2, and truncates it to obtain

rd.– Steps corresponding to Πjsh are simulated similar to the steps

in SΠjsh . SP1trgen computes u, and steps corresponding to Πjmp are

simulated similar to the steps in SPiΠjmp

.

Simulator SP1trgen

Fig. 41: Simulator SP1trgen for corrupt P1

Observe from the simulation steps, that the view of A inthe real world and the ideal world is indistinguishable.

B. Security Proofs for 4PC protocols

The ideal functionality F4PC for evaluating a function fto be computed by ckt in the 4PC setting appears in Fig.42.

F4PC interacts with the servers in P and the adversary S. Letf denote the function to be computed. Let xs be the inputcorresponding to the server Ps, and ys be the corresponding output,i.e (y0, y1, y2, y3) = f(x0, x1, x2, x3).

Step 1: F4PC receives (Input, xs) from Ps ∈ P , and computes(y0, y1, y3, y3) = f(x0, x1, x2, x3).

Step 2: F4PC sends (Output, ys) to Ps ∈ P .

Functionality F4PC

Fig. 42: 4PC: Ideal functionality for computing f in 4PC setting

25

Page 26: SWIFT: Super-fast and Robust Privacy-Preserving Machine ...Nishat Koti , Mahak Pancholi , Arpita Patra , Ajith Suresh Department of Computer Science and Automation, Indian Institute

Now we provide the simulation for each of the sub-protocols.

1) Joint Message Passing: This section provides the secu-rity proof for the jmp4 primitive, which forms the crux forachieving GOD in our constructions. Let Fjmp4 Fig. 21 denotethe ideal functionality and let SPs

jmp4 denote the correspondingsimulator for the case of corrupt Ps ∈ P .

We begin with the case for a corrupt sender, Pi, which isprovided in Fig. 43. The case for a corrupt Pj is similar andhence we omit details for the same.

– SPijmp4 receives vi from A on behalf of honest Pk. If vi = vj

(where vj is the value computed by SPijmp4 based on the interaction

with A, and using the knowledge of the shared keys), then SPijmp4

sets bk = 0, else it sets bk = 1. If A fails to send a value, bk isset to be 1. SPi

jmp4 sends bk to A on behalf of Pk.

– SPijmp4 sends bj = bk to A, and receives bi from A on behalf of

honest Pj . If A fails to send a value, it is assumed to be 1.– SPi

jmp4 receives bi from A on behalf of honest Pl. If bk = 1,SPijmp4 sets bl = 1 and ttp = Pl, else it sets bl = 0 and ttp = ⊥.SPijmp4 sends bl to A on behalf of Pl. SPi

jmp4 invokes Fjmp4 with(Input, vi) and (Select, bl) on behalf of A.

Simulator SPijmp4

Fig. 43: Simulator SPijmp4 for corrupt sender Pi

The case for a corrupt receiver, Pk is provided in Fig.44.

– SPkjmp4 sends v, H(v) (where v is the value computed by SPk

jmp4based on the interaction with A, and using the knowledge of theshared keys) to A on behalf of honest Pi, Pj , respectively.

– SPkjmp4 receives bki, bkj from A on behalf of Pi, Pj , respectively.

If A fails to send a value, it is assumed to be 1.– SPk

jmp4 receives bk from A on behalf of honest Pl. If A fails tosend a value, bk is assumed to be 0. If bk = 1 and bki∨ bkj = 1,SPkjmp4 sets bl = 1 and ttp = Pl, else it sets bl = 0 and ttp = ⊥.SPkjmp4 sends bl to A on behalf of Pl. SPk

jmp4 invokes Fjmp4 with(Input,⊥) and (Select, bl) on behalf of A.

Simulator SPkjmp4

Fig. 44: Simulator SPkjmp4 for corrupt receiver Pk

The case for a corrupt receiver, Pl, which is the serveroutside the computation involved in Πjmp4, is provided in Fig.45.

– SPljmp4 sends bi = bj = bk = 0 to A on behalf of Pi, Pj , Pk,

respectively.– SPl

jmp4 receives bl from A on behalf of Pi, Pj , Pk, and sets ttp =⊥.

– SPljmp4 invokes Fjmp4 with (Input,⊥) and (Select, bl) on behalf

of A.

Simulator SPljmp4

Fig. 45: Simulator SPljmp4 for corrupt fourth server Pl

2) Sharing Protocol: Here we give the simulation stepsfor Πsh4. The case for corrupt P0 is given in Fig.46.

Preprocessing: SP0Πsh4

emulates Fsetup4 and gives the keys(k01, k02, k03, k012, k013, k023 and kP) to A. The values that arecommonly held with A are sampled using the respective keys,while others are sampled randomly.

Online:

– If dealer is P0, SP0Πsh4

receives βv from A on behalf of P1. Stepscorresponding to Πjmp4 are simulated according to SPi

Πjmp4where

P0 acts as one of the sender for sending βv.– If dealer is P1 or P2, SP0

Πsh4sets v = 0 by assigning βv =

αv. Steps corresponding to Πjmp4 for sending βv + γv to A aresimulated according to SPk

Πjmp4where P0 acts as the receiver.

– If dealer is P3, SP0Πsh4

sets v = 0 by assigning βv = αv. SP0Πsh4

sends βv +γv to A on behalf of P3. Steps corresponding to Πjmp4

are simulated according to SPj

Πjmp4where P0 acts as one of the

sender with P1, P2 as the receivers, separately.

Simulator SP0Πsh4

Fig. 46: Simulator SP0Πsh4

for corrupt P0

The case for corrupt P1 is given in Fig. 47. The case fora corrupt P2 is similar.

Preprocessing: SP1Πsh4

emulates Fsetup4 and gives the keys(k01, k12, k13, k012, k013, k123 and kP) to A. The values that arecommonly held with A are sampled using the respective keys,while others are sampled randomly.

Online:

– If dealer is P1, SP1Πsh4

receives βv from A on behalf of P2. Stepscorresponding to Πjmp4 are simulated according to SPi

Πjmp4where

P1 acts as one of the sender for sending βv + γv to P0.– If dealer is P0 or P2, SP1

Πsh4sets v = 0 by assigning βv = αv.

• If dealer is P0, SP1Πsh4

sends βv to A on behalf of P0. Stepscorresponding to Πjmp4 are simulated according to SPj

Πjmp4

where P1 acts as one of the sender to send βv.• If dealer is P2, SP1

Πsh4sends βv to A on behalf of P2. Steps

corresponding to Πjmp4 are simulated according to SPiΠjmp4

where P1 acts as one of the sender to send βv + γv.– If dealer is P3, SP1

Πsh4sets v = 0 by assigning βv = αv. Steps

corresponding to Πjmp4 are simulated according to SPkΠjmp4

whereP1 acts as the receiver for receiving βv + γv.

Simulator SP1Πsh4

Fig. 47: Simulator SP1Πsh4

for corrupt P1

The case for corrupt P3 is given in Fig. 48.

Preprocessing: SP3Πsh4

emulates Fsetup4 and gives the keys(k03, k13, k23, k013, k023, k123 and kP) to A. The values that arecommonly held with A are sampled using the respective keys,while others are sampled randomly.

Online:

– If dealer is P3, SP3Πsh4

receives βv + γv from A on behalf of P0.Steps corresponding to Πjmp4 are simulated according to SPi

Πjmp4

where P3 acts as one of the sender with P1, P2 as the receivers,separately.

– If dealer is P0 or P1 or P2, steps corresponding to Πjmp4 are

Simulator SP3Πsh4

26

Page 27: SWIFT: Super-fast and Robust Privacy-Preserving Machine ...Nishat Koti , Mahak Pancholi , Arpita Patra , Ajith Suresh Department of Computer Science and Automation, Indian Institute

simulated according to SPlΠjmp4

where P3 acts as the server outsidethe computation.

Fig. 48: Simulator SP3Πsh4

for corrupt P3

3) Multiplication Protocol: Here we give the simulationsteps for Πmult4. The case for corrupt P0 is given in Fig.49.

Preprocessing:

– SP0Πmult4

samples [αz]1 , [αz]2 , [Γxy]1 using the respective keyswith A. SP0

Πmult4samples γz, ψ, r randomly on behalf of the

respective honest parties, and computes [Γxy]2 honestly.– Steps corresponding to Πjmp4 are simulated according to SPi

Πjmp4

where P0 acts as one of the sender for sending [Γxy]2.– SP0

Πmult4computes [χ]1 , [χ]2 honestly. Steps corresponding to

Πjmp4 are simulated according to SPkΠjmp4

where P0 acts as thereceiver for [χ]1 , [χ]2.

Online:

– SP0Πmult4

computes [β?z ]1 , [β

?z ]2 honestly. Steps corresponding to

Πjmp4 are simulated according to SPj

Πjmp4where P0 acts as one of

the sender for sending [β?z ]1 , [β

?z ]2.

– SP0Πmult4

computes βz + γz. Steps corresponding to Πjmp4 aresimulated according to SPk

Πjmp4where P0 acts as the receiver for

receiving βz + γz.

Simulator SP0Πmult4

Fig. 49: Simulator SP0Πmult4

for corrupt P0

The case for corrupt P1 is given in Fig. 50. The case fora corrupt P2 is similar.

Preprocessing:

– SP1Πmult4

samples [αz]1 , γz, ψ, r, [Γxy]1 using the respective keyswith A. SP1

Πmult4samples [αz]2 randomly on behalf of the respec-

tive honest parties.– Steps corresponding to Πjmp4 are simulated according to SPl

Πjmp4

where P1 acts as the server outside the computation whilecommunicating [Γxy]2.

– SP1Πmult4

computes [χ]1. Steps corresponding to Πjmp4 are simu-lated according to SPi

Πjmp4where P1 acts as one of the sender for

[χ]1.– Steps corresponding to Πjmp4 are simulated according to SPl

Πjmp4

where P1 acts as the server outside the computation whilecommunicating [χ]2.

Online:

– SP1Πmult4

computes [β?z ]1 , [β

?z ]2. Steps corresponding to Πjmp4 are

simulated according to SPiΠjmp4

and SPkΠjmp4

, where P1 acts as oneof the sender for sending [β?

z ]1, and P1 acts as the receiver forreceiving [β?

z ]2, respectively.– SP1

Πmult4computes βz + γz. Steps corresponding to Πjmp4 are

simulated according to SPiΠjmp4

where P1 acts as one of the senderfor sending βz + γz.

Simulator SP1Πmult4

Fig. 50: Simulator SP1Πmult4

for corrupt P1

The case for corrupt P3 is given in Fig. 51.

Preprocessing:

– SP3Πmult4

samples [αz]1 , [αz]2 , γz, ψ, r, [Γxy]1 using the respectivekeys with A. SP3

Πmult4computes [Γxy]2 honestly.

– Steps corresponding to Πjmp4 are simulated according to SPj

Πjmp4

where P3 acts as one of the sender for sending [Γxy]2.– SP3

Πmult4computes [χ]1 , [χ]2. Steps corresponding to Πjmp4 are

simulated according to SPj

Πjmp4where P3 acts as one of the sender

for sending [χ]1 and [χ]2.

Online:

– Steps corresponding to Πjmp4 are simulated according to SPlΠjmp4

where P3 acts as the server outside the computation involving[β?

z ]1 , [β?z ]2 and βz + γz.

Simulator SP3Πmult4

Fig. 51: Simulator SP3Πmult4

for corrupt P3

4) Reconstruction Protocol: Here we give the simulationsteps for Πrec4. The case for corrupt P0 is given in Fig. 52.The cases for corrupt P1, P2, P3 are similar.

– SP0Πrec4

sends γv to A on behalf of P1, P2, and H(γv) on behalfof P3, respectively.

– SP0Πrec4

receives H([αv]1),H([αv]2), βv +γv from A on behalf ofP2, P1, P3, respectively.

Simulator SP0Πrec4

Fig. 52: Simulator SP0Πrec4

for corrupt P0

5) Joint Sharing Protocol: Here we give the simulationsteps for Πjsh4. The case for corrupt P0 is given in Fig.53.

Preprocessing:

– SP0Πjsh4

has knowledge of αv and γv, which it obtains whileemulating Fsetup4. The common values shared with the A aresampled using the appropriate shared keys, while other valuesare sampled at random.

Online:

– If dealers are (P0, P1): SP0Πjsh4

computes βv using v. Steps

corresponding to Πjmp4 are simulated according to SPj

Πjmp4where

P0 acts as one of the sender for βv.– If dealers are (P0, P2) or (P0, P3): Analogous to the above case.– If dealers are (P1, P2): SP0

Πjsh4sets v = 0 and βv = [αv]1+[αv]2.

Steps corresponding to Πjmp4 are simulated according to SPkΠjmp4

where P0 acts as the receiver for βv + γv.– If dealers are (P3, P1): SP0

Πjsh4sets v = 0 and βv = [αv]1+[αv]2.

Steps corresponding to Πjmp4 are simulated according to SPlΠjmp4

where P0 acts as the server outside the computation for βv, andaccording to SPk

Πjmp4where P0 acts as the receiver for βv + γv.

– If dealers are (P3, P2): Analogous to the above case.

Simulator SP0Πjsh4

Fig. 53: Simulator SP0Πjsh4

for corrupt P0

The case for corrupt P1 is given in Fig. 54. The case forcorrupt P2 is similar.

27

Page 28: SWIFT: Super-fast and Robust Privacy-Preserving Machine ...Nishat Koti , Mahak Pancholi , Arpita Patra , Ajith Suresh Department of Computer Science and Automation, Indian Institute

Preprocessing:

– SP1Πjsh4

has knowledge of α-values and γ corresponding to vwhich it obtains while emulating Fsetup4. The common valuesshared with the A are sampled using the appropriate shared keys,while other values are sampled at random.

Online:

– If dealers are (P0, P1): SP1Πjsh4

computes βv using v. Stepscorresponding to Πjmp4 are simulated according to SPi

Πjmp4where

P1 acts as one of the sender for βv.– If dealers are (P1, P2): Analogous to the previous case, except

that now βv + γv is sent instead of βv.– If dealers are (P3, P1): SP1

Πjsh4computes βv and βv + γv. Steps

corresponding to Πjmp4 are simulated according to SPiΠjmp4

whereP1 acts as one of the sender for βv, βv + γv.

– If dealers are (P0, P2) or (P0, P3) or (P3, P2): SP1Πjsh4

sets v =

0 and βv = [αv]1 + [αv]2. Steps corresponding to Πjmp4 aresimulated according to SPk

Πjmp4where P1 acts as the receiver for

βv.

Simulator SP1Πjsh4

Fig. 54: Simulator SP1Πjsh4

for corrupt P1

The case for corrupt P3 is given in Fig. 55.

Preprocessing:

– SP3Πjsh4

has knowledge of α-values and γ corresponding to vwhich it obtains while emulating Fsetup4. The common valuesshared with the A are sampled using the appropriate shared keys,while other values are sampled at random.

Online:

– If dealers are (P1, P2): SP3Πjsh4

sets v = 0. Steps correspondingto Πjmp4 are simulated according to SPl

Πjmp4where P3 acts as the

server outside the computation for βv + γv.– If dealers are (P0, P1) or (P0, P2): Analogous to the above case.– If dealers are (P0, P3): SP3

Πjsh4computes βv using v. Steps

corresponding to Πjmp4 are simulated according to SPiΠjmp4

whereP3 acts as one of the sender for sending βv.

– If dealers are (P3, P1): SP3Πjsh4

computes βv and βv + γv. Steps

corresponding to Πjmp4 are simulated according to SPj

Πjmp4where

P3 acts as one of the sender for sending βv, βv + γv.– If dealers are (P3, P2): Analogous to the above case.

Simulator SP3Πjsh4

Fig. 55: Simulator SP3Πjsh4

for corrupt P3

6) Dot Product Protocol: Here we give the simulationsteps for Πdotp4. The case for corrupt P0 is given in Fig.56.

Preprocessing:

– SP0Πdotp4

samples [αz]1 , [αz]2 , [Γ~x�~y]1 using the respective keyswith A. SP0

Πdotp4samples γz, ψ, r randomly on behalf of the

respective honest parties, and computes [Γ~x�~y]2 honestly.– Steps corresponding to Πjmp4 are simulated according to SPi

Πjmp4

where P0 acts as one of the sender for [Γ~x�~y]2.

Simulator SP0Πdotp4

– SP0Πdotp4

computes χ1, χ2 honestly. Steps corresponding to Πjmp4

are simulated according to SPkΠjmp4

where P0 acts as the receiverfor χ1 and χ2.

Online:

– SP0Πdotp4

computes [β?z ]1 , [β

?z ]2 honestly. Steps corresponding to

Πjmp4 are simulated according to SPj

Πjmp4where P0 acts as one of

the sender for [β?z ]1 , [β

?z ]2.

– SP0Πdotp4

computes βz + γz. Steps corresponding to Πjmp4 aresimulated according to SPk

Πjmp4where P0 acts as the receiver for

βz + γz.

Fig. 56: Simulator SP0Πdotp4

for corrupt P0

The case for corrupt P1 is given in Fig. 57. The case forcorrupt P2 is similar.

Preprocessing:

– SP1Πdotp4

samples [αz]1 , γz, ψ, r, [Γ~x�~y]1 using the respectivekeys with A. SP1

Πdotp4samples [αz]2 randomly on behalf of the

respective honest parties.– Steps corresponding to Πjmp4 are simulated according to SPl

Πjmp4

where P1 acts as the server outside the computation for [Γ~x�~y]2.– SP1

Πdotp4computes χ1. Steps corresponding to Πjmp4 are simulated

according to SPiΠjmp4

where P1 acts as one of the sender for χ1.

– Steps corresponding to Πjmp4 are simulated according to SPlΠjmp4

where P1 acts as the server outside the computation for χ2.

Online:

– SP1Πdotp4

computes [β?z ]1 , [β

?z ]2. Steps corresponding to Πjmp4 are

simulated according to SPiΠjmp4

and SPkΠjmp4

, where P1 acts as oneof the sender for [β?

z ]1, and P1 acts as the receiver for [β?z ]2.

– SP1Πdotp4

computes βz + γz. Steps corresponding to Πjmp4 aresimulated according to SPi

Πjmp4where P1 acts as one of the sender

for βz + γz.

Simulator SP1Πdotp4

Fig. 57: Simulator SP1Πdotp4

for corrupt P1

The case for corrupt P3 is given in Fig. 58.

Preprocessing:

– SP3Πdotp4

samples [αz]1 , [αz]2 , γz, ψ, r, [Γ~x�~y]1 using the respec-tive keys with A. SP3

Πdotp4computes [Γ~x�~y] honestly.

– Steps corresponding to Πjmp4 are simulated according to SPj

Πjmp4

where P3 acts as one of the sender for [Γ~x�~y]2.– SP3

Πdotp4computes χ1, χ2. Steps corresponding to Πjmp4 are

simulated according to SPj

Πjmp4where P3 acts as one of the sender

for χ1, χ2.

Online:

– Steps corresponding to Πjmp4 are simulated according to SPlΠjmp4

where P3 acts as the server outside the computation for[β?

z ]1 , [β?z ]2, βz + γz.

Simulator SP3Πdotp4

Fig. 58: Simulator SP3Πdotp4

for corrupt P3

28

Page 29: SWIFT: Super-fast and Robust Privacy-Preserving Machine ...Nishat Koti , Mahak Pancholi , Arpita Patra , Ajith Suresh Department of Computer Science and Automation, Indian Institute

7) Truncation Pair Generation: Here we give the simula-tion steps for Πtrgen4. The case for corrupt P0 is given in Fig.59. The case for corrupt P3 is similar.

– SP0Πtrgen4

samples R1, R2 using the respective keys with A.

– Steps corresponding to Πjsh4 are simulated according to SP0Πjsh4

(Fig. 53).

Simulator SP0Πtrgen4

Fig. 59: Simulator SP0Πtrgen4

for corrupt P0

The case for corrupt P1 is given in Fig. 60. The case forcorrupt P2 is similar.

– SP1Πtrgen4

samples R1 using the respective keys with A, andsamples R2 randomly.

– Steps corresponding to Πjsh4 are simulated according to SP1Πjsh4

(Fig. 54).

Simulator SP1Πtrgen4

Fig. 60: Simulator SP1Πtrgen4

for corrupt P1

Observe from the simulation steps, that the view of A inthe real world and the ideal world is indistinguishable.

29