architectural and api proposal for the secure processing ... · privacy-preserving storage and...

29
WP3 – Privacy preserving storage and computation architecture 1 Architectural and API proposal for the secure processing stack D3.1 Project reference no. 653884 February 2016

Upload: others

Post on 17-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Architectural and API proposal for the secure processing ... · privacy-preserving storage and computation1. Heavily rooted in the cloud computing paradigm, such architecture comprises

WP3–Privacypreservingstorageandcomputationarchitecture 1

Architectural and API

proposal for the secure processing stack

D3.1

Project reference no. 653884

February 2016

Page 2: Architectural and API proposal for the secure processing ... · privacy-preserving storage and computation1. Heavily rooted in the cloud computing paradigm, such architecture comprises

WP3–Privacypreservingstorageandcomputationarchitecture 2

Documentinformation

Scheduleddelivery 29.02.2016Actualdelivery 01.03.2016Version 1.0ResponsiblePartner INESC-TEC

Disseminationlevel

Public

Revisionhistory

Date Editor Status Version

Changes

07.02.2016 Francisco Maia, JoãoPaulo

Draft 0.1 FirstDraft

10.02.2016 Bernardo Portela, DanBogdanov

Draft 0.2 Architecturedefinition

17.02.2016 João Paulo, FranciscoMaia, Bernado Portela,DanBogdanov

Submission 0.3 FirstSubmission

18.02.2016 Miguel Pardal, MiguelCorreia

Revision 0.4 INESCRevision

19.02.2016 João Paulo, FranciscoMaia, Bernado Portela,DanBogdanov

Revision 0.5 Changesaccordingtorevision.

26.02.2016 Valerio Schiavoni andHuguesMercier

Revision 0.6 UniNERevision

29.02.2016 João Paulo, FranciscoMaia,KarlTarbe

Finalversion

1 Finalversionofthedeliverable

31.08.2016 João Paulo, HuguesMercier

Finalversion

1.1 Commentsclean-up

Contributors

BernadoPortela(INESC-TEC)DanBogdanov(CYBERNETICA)KarlTarbe(CYBERNETICA)ReimoRebane(CYBERNETICA)FranciscoMaia(INESC-TEC)JoãoPaulo(INESC-TEC)RogérioPontes(INESC-TEC)

Page 3: Architectural and API proposal for the secure processing ... · privacy-preserving storage and computation1. Heavily rooted in the cloud computing paradigm, such architecture comprises

WP3–Privacypreservingstorageandcomputationarchitecture 3

Internalreviewers

MiguelPardal(INESC-ID)MiguelCorreia(INESC-ID)ValerioSchiavoni(UniNE)HuguesMercier(UniNE)

AcknowledgementsThis project is partially funded by the European Commission Horizon 2020 workprogrammeundergrantagreementno.653884.Moreinformation

Additional information and public deliverables of SafeCloud can be found athttp://www.safecloud-project.eu

Page 4: Architectural and API proposal for the secure processing ... · privacy-preserving storage and computation1. Heavily rooted in the cloud computing paradigm, such architecture comprises

WP3–Privacypreservingstorageandcomputationarchitecture 4

Glossaryofacronyms

Acronym DefinitionD DeliverableDoA DescriptionofWorkEC EuropeanCommissionPM ProjectManagerPO ProjectOfficerWP WorkPackageSQL StructuredQueryLanguageAPI ApplicationInterfaceICT InformationandCommunicationsTechnologyIT InformationTechnologyACID Atomicity,Consistency,Isolation,Durability

Page 5: Architectural and API proposal for the secure processing ... · privacy-preserving storage and computation1. Heavily rooted in the cloud computing paradigm, such architecture comprises

WP3–Privacypreservingstorageandcomputationarchitecture 5

Tableofcontents

Documentinformation.................................................................................................................................2

Disseminationlevel.......................................................................................................................................2

Revisionhistory..............................................................................................................................................2

Contributors.....................................................................................................................................................2

Internalreviewers.........................................................................................................................................3

Acknowledgements.......................................................................................................................................3

Moreinformation...........................................................................................................................................3

Glossaryofacronyms....................................................................................................................................4

Tableofcontents............................................................................................................................................5

1 Executivesummary..............................................................................................................................7

2 Generalarchitectureforprivacy-preservingcomputation................................................10

2.1 DeploymentScenario......................................................................................................................................10

2.2 GeneralsolutionAPI........................................................................................................................................12

2.2.1 SQLAPI.......................................................................................................................................................13

2.2.2 NoSQLAPI.................................................................................................................................................14

2.2.3 Discussion.................................................................................................................................................14

2.3 GeneralSolutionArchitecture.....................................................................................................................15

3 Solution1:Secureprocessinginasingleuntrusteddomain.............................................16

3.1 Introduction........................................................................................................................................................16

3.2 ArchitectureandAPI.......................................................................................................................................17

3.2.1 Trusteddeployment.............................................................................................................................18

3.2.2 Untrusteddeployment........................................................................................................................19

3.3 Discussion.............................................................................................................................................................19

4 Solution2:Secureprocessinginmultipleuntrusteddomains..........................................20

4.1 Introduction........................................................................................................................................................20

4.2 ArchitectureandAPI.......................................................................................................................................21

4.2.1 Trusteddeployment.............................................................................................................................22

Page 6: Architectural and API proposal for the secure processing ... · privacy-preserving storage and computation1. Heavily rooted in the cloud computing paradigm, such architecture comprises

WP3–Privacypreservingstorageandcomputationarchitecture 6

4.2.2 Untrusteddeployment........................................................................................................................23

4.3 Discussion.............................................................................................................................................................23

5 Solution3:Secureprocessinginmultipleuntrusteddomainswithuntrustedclients 24

5.1 Introduction........................................................................................................................................................24

5.2 Architecture........................................................................................................................................................25

5.2.1 Proxycomponent...................................................................................................................................25

5.2.2 Back-endComponent...........................................................................................................................26

5.3 Discussion.............................................................................................................................................................26

6 Conclusion............................................................................................................................................27

7 References............................................................................................................................................28

Page 7: Architectural and API proposal for the secure processing ... · privacy-preserving storage and computation1. Heavily rooted in the cloud computing paradigm, such architecture comprises

WP3–Privacypreservingstorageandcomputationarchitecture 7

1 Executivesummary

TheSafeCloudframeworkhasthreelayers,asdepictedinFigure1presentsthesecurecommunicationtechnologythatensuresthatpartiescanformsecurechannelsthatareeven harder to eavesdrop or block; SafeCloud secure storage technologies go beyondencryption and offer protection against deletion and destruction; SafeCloud securequery technology extends cryptographic protection from database storage to dataprocessing. An ICT system developer can combine solutions from the three layers toprovidebeyond-state-of-the-artprotectionforpersonaldataorbusinesssecrets.

Figure1:TheSafeCloudframework

The objective of work package 3 (WP3) is centred on the practical and theoreticalinvestigationofmechanismsfordatadistributionandprocessing, inawaythatallowsforsecurityassurancestobeprovidedwhilstmaintainingcomputationcapabilities.Itis,therefore,focusedonthesecurequeryorsecureprocessinglayer(bottomofthetable).

Thisdeliverable(D3.1)isthefirstofWP3,tobedeliveredonmonth6.WedescribethegeneralarchitectureandthreedifferentsolutionsforprotectingprivatedataduringSQLqueryexecution.Thethreesolutionsprovidevaryinglevelsofsecurityandperformance.Forexample,solutions1and2canachievehigherperformancelevels,butassumethatthequerysourcecanbetrustedwithallthedata.However,solution3nolongerassumesthe query source can be trusted and uses a stronger encryption scheme. This lastsolution is thus very suitable for securely processing data collected from multiplesources.

This document is structured as follows. Section 2 will describe the high levelarchitecture that is common to the three solutionsproposed in thisWP. Sections3, 4and 5 match solutions 1, 2 and 3 respectively, detailing what are the proposedapproaches,theassociatedmotivation,andhowtheyfitthegeneralarchitecture.Finally,Section5concludesthedeliverablewithageneraloverviewofitscontents.

Solution: Vulnerability-tolerantchannels Protectedchannels Route-awarechannels

Gives:Tolerancetovulnerabilities

incomponents

Decreasedriskoffakecertificates;resistancetoportscansandenumeration

ofnetworkinfrastructure

Improvedconfidentialitywithwarningsaboutroutehijackingandmakingharder

accesstocommunication

API: ExtendedsecuresocketAPI ExtendedsecuresocketAPI ExtendedsecuresocketAPI

Providedby: INESC-ID,TUM INESC-ID,TUM INESC-ID,TUM

Solution: Distributedencryptedfilesystem Long-termdistributedencrypteddatastorage

Secureblockstorage

Gives: EncryptedfilestorageEntangledimmutabledatastorageforprotectionagainsttampering

andcensorship

Blockstorageonindividualdatacenters

API: POSIX REST(S3orsimilar) Key/value

Providedby: UniNE,INESC-ID UniNE,INESC-TEC UniNE,INESC-TEC

Solution: Secureprocessinginasingleuntrusteddomain

Secureprocessinginmultipleuntrusteddomains

Secureprocessinginmultipleuntrusteddomainswithuntrustedclients

Gives:Privacyofdatavaluesagainsttheserver,optionalkeyprivacy

Privacyofkeyanddatavaluesagainsttheservers

Privacyofkeyanddatavaluesagainsttheserversandclients

API: SQL SQL SQL

Providedby: INESC-TEC INESC-TEC,Cyber Cyber

SafeCloud architecture

Secure

storage

Stateoftheart:

Encryptedstorage

Stateoftheart:

CryptDB

Secure

queries

Secure

communication

Stateoftheart:

TLSsecurechannels

Page 8: Architectural and API proposal for the secure processing ... · privacy-preserving storage and computation1. Heavily rooted in the cloud computing paradigm, such architecture comprises

WP3–Privacypreservingstorageandcomputationarchitecture 8

The secure processing layer of SafeCloud is focused in investigating, evaluating andimplementing solutions that provide adequate trade-offs between performance andsecuritylevels,leveragingtrustmodelsandcomputationalcapacitiesrequiredfromthetrustedanduntrusteddeployments.However, real-worldscenariosand theirpracticaldeployments canwidely vary, so it is highly unlikely to expect a single solution to fiteveryparticularcase.Withthatinmind,alternativelytowhatcouldbeachievedwithasingle andmore focused approach,wepropose three solutions forprivacy-preservingstorage that are suited for applications with distinct security and performancerequirements.Thisdecisionallowsextendingthepracticabilityandapplicabilityofthesecure processing layer. The compromises in designing our proposed solutions aredepictedinFigure2.

Figure2:SafeCloudsolutionrangeandtrade-offs

Thefirstsolutiondescribesasimplecloudsetting,inwhichclientshaveaccesstoalocalsecure SQL engine, andwant to outsource data storage and computation to a remoteNoSQLengine.Whenthisdeploymentisconsideredtobeuntrusted(asisthecasewhentheoneholdingthedataisresponsibleforhandlingsensitiveinformation),theissueofsecurityandprivacyshouldbetakenintoconsiderationforprotocoldevelopment,whilestillensuringthatthesolutionremainspracticalandapplicableinrealworldscenarios.For instance, thechoiceofusingstandardcryptographic techniquesonalldataallowsfor achieving reasonably strong security guarantees, however it disables computationoverthedata,whichinmanycasesdefeatsthepurposeofemployingacloudprovider.Assuch,oursolutionproposes theusageofdifferent techniques fordifferent levelsofdata sensitivity, aiming to provide “good enough” security assurances while enablingefficientcomputationoverstoreddata.

Thesecondsolutionaimstoleveragemultipleuntrustednon-colludingcloudproviderstoenhancesecurityguaranteeswithoutlosingcomputationalfunctionalities.Itassumesthe same client with a secure SQL engine, but now it communicates with several

Page 9: Architectural and API proposal for the secure processing ... · privacy-preserving storage and computation1. Heavily rooted in the cloud computing paradigm, such architecture comprises

WP3–Privacypreservingstorageandcomputationarchitecture 9

untrustedNoSQLcloudproviders.Byemployingcryptographictechniquesthatrelyondistributinginformationoverthesenon-colludingentities,state-of-the-artprotocolsformulti-party computation can enable secure function execution over data that is keptconfidentialfromeveryindividualcloudprovider.Theperformancevaluesareexpectedtobelowerthantheonesfromthepreviousapproach,howeverthissettingallowsforcomputationoverdatathatiskeptinformation-theoreticallysecureatalltimes.

The third and final solution is different from the previous ones in twomajor points.First,itreducesthecomputationalpowerrequiredfromtheclient,byallowingtheSQLenginetoberunintheuntrustedenvironment.Second,itallowsformultipleclientstoprocessoverthestoreddatawithoutassumingthatallofthemaretrusted(i.e.thattheytrusteachother).Thelatteropensthepossibilityforseveraluserswithsensitivedatatojointlystoredatawithsecurityguarantees,whileallowingcomputations(suchasdataanalytics) tobeexecutedover the collected information.This isparticularly impactfulforclientssuchashospitalsandgovernmentalinstitutions,whichmanagelargeamountssensitivedatabutalsowouldhighlybenefitfromjointstatisticalanalysisoverit.

Page 10: Architectural and API proposal for the secure processing ... · privacy-preserving storage and computation1. Heavily rooted in the cloud computing paradigm, such architecture comprises

WP3–Privacypreservingstorageandcomputationarchitecture 10

2 Generalarchitectureforprivacy-preservingcomputation

Cloud Computing has become a key paradigm for application and servicedeployment.Companiesbenefit from thepay-as-you-gomodel andavoidhighupfrontcostswithITinfrastructures.Themovetothecloudparadigmalsotargetsperformance,since the cloud resources canadaptand scaleaccording to thedemand. Suchbenefitscomeat theexpenseof the company’s controlover the infrastructureand require thecompanytotrustonathirdparty-acloudprovider-withthecompany’ssoftwarestackanddata.Therefore, taking advantageof cloud capabilities is non-trivial as this raisesseveral security and privacy concerns, as well as architectural challenges [AFG+10,MSL+11].

The lack of secure and private solutions for data storage and computation incurrentcloudenvironmentsdemandsdifferentdeploymentcompromises,whichinturnleadtosuboptimalsystems. Inparticular,evenifcompaniescanlargelybenefit fromamovetocloudservices,theyavoiddoingsoduetosecurityconcerns[DROPBOX12].Asaconsequence, thesolution is tocontinuerelyingon-premises’ infrastructures toholdthemostcriticalpartsofthecompany’ssystem.Theresultingsystem,therefore,consistsof a hybrid infrastructure composed by on-premises storage and computation and byoffloaded storage and computing capabilities to the cloud. Considering this setting,different compromises can be obtained according to the amount of offloadedresponsibilities,whicharerelatedtothetrustleveleachcompanyattributestothecloudprovider.

In this section, we present a general architecture for SafeCloud solutions toprivacy-preserving storage and computation1. Heavily rooted in the cloud computingparadigm,sucharchitecturecomprisesasetofcomprehensivesolutionstoprivateandsecure data storage and computation considering the project use cases. Each solutionoffers practical trade-offs between performance and security as well as betweenprocessingpowerandinfrastructuretrustworthiness.

WebeginbydescribingtheassumptionsconsideredinthedesignofSafeCloud’sprivacy-preserving processing solutions by presenting the planned deploymentscenario.Weproceedpresentingthegeneralarchitecturethatabstractseachoneofthedifferentsolutionsandcoverstheirkeycomponents.Finally,eachSafeCloudsolutionispresenteddetailingthesecurityandtrustmodelsitconsiders,andarepresentativeusecaseofitsapplicability.

2.1 DeploymentScenario

Traditionally,companiesandorganisationsinternallydevelopandmaintaintheirIT infrastructure to support their activities. Such practice has, often, very high costsassociatedtotheinfrastructureitselfaswellasderivingfromproblemssuchasresourceover-provisioningorunder-provisioning.Morerecently,thecloudcomputingparadigmchangedhowapplicationsarebeingdeployedandhowITinfrastructuresaremanaged.

1 Data computation will also be referred as data processing in this document. In the context of SafeCloud, data computation or processing is related to the knowledge that can be obtained from processing a given set of data. In particular, this encompasses data aggregation, data filtering and data analytics to mention a few.

Page 11: Architectural and API proposal for the secure processing ... · privacy-preserving storage and computation1. Heavily rooted in the cloud computing paradigm, such architecture comprises

WP3–Privacypreservingstorageandcomputationarchitecture 11

Thecoreideaofthisnewparadigmistoleveragelargeinfrastructuredeploymentsinapay-as-you-gomodelwhereclientscandynamicallyallocateandreleaseresourcesfromsuchdeploymentsandpayonlyforwhattheyuse.Adirectconsequenceofthismodelisthat, now, a company can have access to large infrastructures and set-up largedeploymentswithoutincurringinsignificantup-frontcosts[AFG+10].

Modernapplicationandservicedeploymentscanfallintooneofthreecategories.Thefirstoneincludesin-housedeploymentswhereeverythingfromITinfrastructuretosoftware is designed, developed, andmaintainedwithin the company or organizationfacilities.Thesecondoneistheoppositesituationwhereeverythingishandedovertoacloudoragroupofcloudprovidersandtheorganizationorcompanysimplymanagesitsinfrastructureremotely.Thethirdcategoryincludesanycompromisebetweenthetwoprevious ones. This last category comprehends, for instance, situationswhere criticalservicesarekeptonpremisesandcomplementaryservicesaredeployedinthecloudforeconomicalbenefits.

Considering the third category of services, different trade-offs are possiblebetweenwhattodeployonpremisesandonthecloud.Thesearesecuritytrade-offsbutalso performance trade-offs. For instance, if the necessary computational and storageresources exist on premises, doing all the computation there may result in betterperformancedue to the fact that securitymechanismsbenefit fromhavingapowerfultrustedenvironment,anddonothaveaheavyimpactovertheprocessingmechanismsthemselves. On the other hand, the cloud computingmodel can bemore attractive interms of cost since the pay-as-you-go model allows the allocation of computationalresources only when they are required and, for large-scale applications, the cloud’selasticitycharacteristicsmakeitanidealchoice.

However, offloading data storage and processing to an untrusted environmentwhile keepingdataprivate, requires complex securitymechanisms thatmay lead to adecreaseinapplicationsperformance[CK08].

Figure3depictsanoverviewofsuchadeployment.Typically,aproxyisrequiredtointegrateboththelocalandtheremotedeployments.Thisproxyisstillconsideredtobelocatedonpremises.

Page 12: Architectural and API proposal for the secure processing ... · privacy-preserving storage and computation1. Heavily rooted in the cloud computing paradigm, such architecture comprises

WP3–Privacypreservingstorageandcomputationarchitecture 12

Figure3:Deploymentscenario

Given the distinct advantages of deploying applications in an on premisesinfrastructureandonacloudinfrastructure,inSafeCloudweassumethefollowingtrustconsiderations. The on premises infrastructure is assumed to be completely trusted,whichimpliesthatitisallowedtoreceive,handleandprocesssensitivedataintheclear(thisdoesnotmeanthatitcannotbeattackedorhavemaliciousinsiders,onlythatthesearenottheproblemsSafeCloudaimstosolve).Ontheotherhand,cloudprovidersarenot allowed to deviate from the system’s specified behavior, but may attempt tocomputeover collecteddata toascertainadditional information.Thisassumption thatcloud providers can compromise the privacy of data stored and computed at theirinfrastructuresisacoreconcernoftheSafeCloudprojectandofthisworkpackage.Asaconsequence, data processing on premises can be done over in the clear data andwithoutaccessrestrictions.Inthecloud,however,sensitivedataneedstobeprotectedandprocessingmaybeimpairedinordertoachievesuchprotection.

As an example, take a clinical analysis laboratory that processes and storessensibledata frompatients,andwhereanonpremisescluster infrastructure,withthenecessarysecuritymechanismstobesafeagainstexternalattackers, isavailable.Ifthenumber of patients is very large or grows significantly, and the on premises clusterstorageandprocessingcapabilitiesareno longersufficient, itmaybedesirable tooff-load the data and some of the processing capabilities to a third-party infrastructure,suchas a cloud infrastructure.However, this infrastructure isno longer controlledbythe applications and data rightful owners, which may lead to security breaches andundesired informationdisclosure.These concernswerevery recently confirmed tobewell justifiedaftertherecentmediahypesurroundingtherevelationsmadebyformerNSA contractor Edward Snowden and high-profile security vulnerabilities such as theHeartbleed[D+14] bug in OpenSSL[L13]. In SafeCloud, cloud infrastructures areconsidereduntrustedenvironments.

2.2 GeneralsolutionAPI

HavingdescribedthedeploymentscenarioconsideredbySafeCloud’ssecureprocessingsolutions,wenowfocusontheinterfacesthatwillbeofferedtotheapplications.

Page 13: Architectural and API proposal for the secure processing ... · privacy-preserving storage and computation1. Heavily rooted in the cloud computing paradigm, such architecture comprises

WP3–Privacypreservingstorageandcomputationarchitecture 13

AmajorgoalforanycomponentintheSafeCloud’sstackisapplicabilityandpracticality.SafeCloud aims at developing software components that can have immediate andwidespreadadoptionfromtheindustry.Accordingly,itbecomesinevitabletopursuethedefinition of system interfaces that can accommodate current practices, that relievecurrentsystemsandapplicationsofanykindofrefactoring,whileremainingcompatiblewithSafeCloud’sgoals.

Lookingatthecurrentlandscapeintermsofdatamanagementsystemsitispossibletocategorize systems into two major groups according to their interface (API). On onehand, we have traditional relational database systems that rely on a query language,typicallySQL,as their interface.SQL isawidely-usedprogramming languagedesignedformanagingdatastoredatrelationaldatabasesystems[LCW93].Therelationalmodelused in these traditional SQL databases is well-studied and understood, and SQL issupported as the main query language of the most successful commercial databasesystems.

Ontheotherhand,wehavetheso-calledNoSQLdatabases[HHL+11].NoSQLdatabasesare becoming increasingly popular among applications that must scale for largeamounts of data and clients. A major advantage of NoSQL over traditional relationaldatabases is that scalability can be achieved in a distributed environment withcommodityhardware, thusnot requiringexpensivededicatedhardware,whichallowstolowerdatastorageandprocessingcosts.Ontheotherhand,NoSQLdatabaseshavealimited set of operationswhen compared to relational databases. Even thoughNoSQLdatabasesarenowfocusedonofferingricherquerylanguagesandthatundertheNoSQLdesignationwecanfinddifferentdatamodels(documentbased,key-value,graphs,etc.),themost successfulandwidespreadNoSQLdatabases, in their corecanbe seenasandictionary composed by several unique keys that are associated with one or morevalues.Itispossibletoinsertnewkeysandvalues,toupdatethevalueofaspecifickey(put),togetthevalueforaspecifickey(get),togetthevaluesforadeterminedrangeofkeys(scan)andtodeleteexistingkeys(delete).

Notwithstanding the fact that NoSQL databases are successfully deployed in manycontexts and applications and have undeniable popularity, a significant slice of theworld’ssystemsare,andarguablywillcontinuetobe,heavilyrootedonSQLinterfacesandtherelationalmodel.Thissituationcanbepartiallyexplainedbytheabundanceoflegacy applications or simply because SQL is a significantly richer API in terms ofexpressiveness. Additionally, many applications require ACID guarantees andtransactional support, which are key to enable support for multiple and concurrentclientaccesstothesamedatabase.ThiskindofguaranteesarealsotypicaloftraditionalrelationaldatabasesandlackinginNoSQLdatabases.DuetothedistinctadvantagesofSQLandNoSQLapproaches, thisworkpackagewill supportboth typesofAPIs,whicharefurtherdetailednext.

2.2.1 SQLAPI

In thedesignof thesecuredatastorageandprocessingsolutionswewill consider theSQLlanguagespecificationasitisdefinedbytheAmericanNationalStandardsInstitute(ANSI),whichadopteditasastandardin1986[DD+93,ANSI16].Althoughthisstandardis still being improved, most relational database solutions try to follow the samespecificationof theSQL language.Since this isaveryrich language,havingastandardspecification is key for the interoperability of distinct database products. The SQL

Page 14: Architectural and API proposal for the secure processing ... · privacy-preserving storage and computation1. Heavily rooted in the cloud computing paradigm, such architecture comprises

WP3–Privacypreservingstorageandcomputationarchitecture 14

language allows the expression of statements to change the schema of relationaldatabases, to insert,updateanddeletedata from thedatabasesandalso toperformavastrangeofqueriesoverthedatastoredatthesedatabases.

Asmentioned previously, transactional support is another important characteristic ofSQL databases that, formany applications, is key formaintaining data integrity, thusensuringthatmultipleclientsaccessingthesamedatabaseseeaconsistentstateofthedatabaseatalltimes.

2.2.2 NoSQLAPI

Under theNoSQL term it is possible to find a diverse set of scalable andwidespreaddatabase technologies. In SafeCloud,wewill focus on ApacheHBase, one of themostsuccessfulandwidelyusedNoSQLdatabases[APACHE16].Notonlyitisconsideredoneof themostmatureNoSQLdatabases in themarket but it is also theNoSQLdatabasewithwhich theSafeCloudpartnershavea longer,continuedandextensiveknowledge.This is a clear advantage for the success of the project. Consequently, the NoSQLinterfaceofferedbySafeCloud’ssolutionwillbeheavilyinspiredbytheHBaseAPIwhichwenowdescribeindetail.

Apache HBase is a distributed, scalable and open-source non-relational database.InspiredbyGoogle'sBigTable, itcanbe thoughtasamulti-dimensionalsortedmaportable.Eachrowisidentifiedbyauniquekeyandmaybecomposedofseveralcolumns(values). HBase exposes a set of operations for data access that are quite similar tooperationsusedinotherkey-valuedatastores:

GET-Getkey-valuepairsofagivenrow;

PUT-Insertakey-valuepairforexistingornewrow;

SCAN-Getallkey-valuepairsofarowrange;

DELETE-Removeoneormorekey-valuepairsofrows;

ThissetofoperationswillconstitutetheSafeCloudsecureprocessingNoSQLAPI.

2.2.3 Discussion

AllthesolutionsdescribedinthisworkpackagewillprovideaSQLinterface,initsANSIdefinition,alongwithtransactionalsupport.Nevertheless,itisimportanttonoticethat,in order to provide certain security and privacy guarantees while maintainingacceptableperformancelevels,someofthesolutionsmaynotoffercompletecoverageoftheSQLlanguagefeatures.Wheneverthisisthecase,wewilldetailtheactuallanguagecoverageofthesolutionandthemotivationsbehindthoselimitations.

While offering a SQL interface allows SafeCloud solutions to be attractive and highlyapplicable for nowadays systems and applications, specific classes of applications areleft out. In particular, applications to which NoSQL databases completely fulfill theirrequirements.Accordingly, it shouldalsobeaconcern forSafeCloudtooffersolutionswith this type of interface, as it expands its scope to a larger set of applications. Byleveraging previous research developed on INESC TEC in the context of several EUresearchprojects,weprovidesolutionscapableoffulfillingbothdemands:aSQLandaNoSQLinterface.Moreprecisely,weaimtoprovideasystemthatoffersafullANSISQL

Page 15: Architectural and API proposal for the secure processing ... · privacy-preserving storage and computation1. Heavily rooted in the cloud computing paradigm, such architecture comprises

WP3–Privacypreservingstorageandcomputationarchitecture 15

API along with transactional support, built on top of a NoSQL database in an ultra-scalabledesign[CNIMBO09,CPAAS14].

2.3 GeneralSolutionArchitecture

InthissectionwedescribethehighlevelarchitectureforSafeCloud’sprivacy-preservingprocessing solutions. The goal of providing this overview is to allow for theestablishmentof clear relationshipsbetween thedifferent solutions,highlighting theirdifferencesandsimilarities.Weconsiderthisexerciseessentialfortherigorousanalysisof each individual solution, and for theevaluationof its feasibilitywith respect to theassociatedapplicationrequirements.

FromaveryhighlevelperspectiveandconsideringthedeploymentscenarioofSection2.1,wecanidentifytwomajorsystemenvironments:anon-premisesinfrastructureandanexternalinfrastructure,typicallyaCloudprovider.Inbothenvironments,thechoiceof system components deployed and tasks performed leads to different compromisesamongcost,performance,securityandprivacy.

Tothesetofcomponentsrunningon-premiseswewillcalltrusteddeployment,andtothe set of components offloaded to third-party premises we will call untrusteddeployment. It is intuitive to visualize the systems that lie at the extremes of thisnotationscheme:ononeside,wehavea trusteddeployment thatencompassesall thedesired functionalities, thusnot requiringanyoffloadingof storageor computation totheuntrusteddeployment;ontheotherside,asystemwhereallcomputationisdoneintheuntrusteddeployment,possiblyduetolackofsystemresourcesatthetrustedside.The SafeCloud solutions represent compromises lying in between these extremescenarios.

Animportantaspecttonoticeisthatboththetrustedandtheuntrusteddeploymentcanbecomposedbyoneormoreinfrastructures.Forinstance,wemightconsiderascenarioinwhich trust is distributed among several cloudproviders, so as to avoid offloadingdatastorageandcomputationtoasingleentity’ssuperintendence.Havingthisinmind,we propose a SafeCloud general architecture for privacy-preserving data storage andprocessing depicted in Figure 4. We make use of squares to represent processingcomponents,roundededgerectanglestorepresentinterfacesandcylinderstorepresentstoragecomponents.Storagecomponentscanalsoofferprocessingcapabilities.

Page 16: Architectural and API proposal for the secure processing ... · privacy-preserving storage and computation1. Heavily rooted in the cloud computing paradigm, such architecture comprises

WP3–Privacypreservingstorageandcomputationarchitecture 16

Figure4:GeneralArchitecture

We consider a set of components in each one of the deployments. Each solutionwillofferaspecificinterfacetotheapplicationemployingit,theClientInterfacecomponent.As described in previous sections, this interface will typically expect SQL queries,althoughaNoSQLinterfacewillalsobeavailableforapplications.Atthesametime,aninterface to theuntrusteddeployment is alsodefined for each solution.Note that thiscanalsocorrespondtoaSQLinterface,toaNoSQLinterface,ortoasubsetofeitheroneof those. Additionally, we consider that some processing and storage capabilities(namely theprocessingandstoragecomponents) canbedeployed inboth the trustedanduntrustedsides.Finally,weconsideraproxycomponentthatisresponsibleforthecommunicationbetweenthetrustedanduntrusteddeployment.

InthenextsectionswewilldescribeindetailthethreeSafeCloudsolutionsforprivacy-preservingdatastorageandprocessing.Foreachoftheseproposedsolutions,wedefinetheprocessingresponsibilitiesassignedtothetrusteddeploymentandtotheuntrustedone.Wewillalsoinstantiatethecomponentsofthisgeneralarchitecturewithconcretesoftwareprotocols,anddetailtheinterfacesrequiredforthesolutiondeployment.

3 Solution1:Secureprocessinginasingleuntrusteddomain

3.1 Introduction

The first solution we consider is expected to achieve the highest performance whencompared to the alternatives described in this document. At the same time, it is thesolutionwhere the trustmodel is the strongestandwhere theamountof storageandprocessing capabilities delegated to the untrusted deployment is the lowest. Briefly,solution 1 assumes a deployment setting where the trusted deployment has somecomputationalpowerbutscarcestorageresources,while theuntrusteddeployment isassumedtohavebothcomplementarycomputationalpowerandthenecessarystoragecapabilities.

Page 17: Architectural and API proposal for the secure processing ... · privacy-preserving storage and computation1. Heavily rooted in the cloud computing paradigm, such architecture comprises

WP3–Privacypreservingstorageandcomputationarchitecture 17

As an application example for this solution, consider a laboratory that owns orcompletely trustsaprivatecloudwithsomecomputationalpower,but is interested inoffloadingmost of the storage effort into an untrusted cloud domain. The laboratorymanagessensitivedata,soitisnotacceptabletohaveitstored/processeditintheclearoutsideofwhatthelaboratoryconsiderstobeitstrusteddomain.

Asmentionedabove,weaimatofferingbothaSQLandaNoSQLinterfacetoSafeCloudclients.Leveragingwork frompreviousprojects it ispossible tooffera fullyANSISQLcompliantdatabasesystemontopofaNoSQLdatabase, inparticular,ontopofHBase.This is achieved by decomposing the database functionality into several componentsthat address self-contained challenges and allow thewhole system to scale. In detail,SQL queries are issued to an engine component that translates them to NoSQLstatementshandledbyHBase.Atthesametime,transactionalsupportisguaranteedbya separate component.Buildingon sucharchitecturewewillbeable tooffera secureNoSQLinterfacetoSafeCloudclientapplicationsand,ontopofsuch interface,offeranANSISQLone.

In this first solutionwe assume that the SQL query processing and the transactionalsupportprocessingcanbedoneentirelyonthetrusteddeployment.ThecryptographictechniquesareappliedovertheunderlyingNoSQLengine,inthesensethatthetrusteddeploymentwillberesponsiblefortheencryptionofdatabeforesendingittothecloudprovider,tobehandledbytheuntrusteddeployment.NotethatitisimperativeforthesetechniquestoallowtheNoSQLsystemtoprovidethenecessaryfunctionalitytosupporttheaforementionedANSISQLdatabasesystem.

Inthenextsection,wedescribeindetailthedesignandarchitectureofSolution1,anddiscussthetradeoffsbetweensecurityandperformancethatitentails.

3.2 ArchitectureandAPI

In Figure 5we depict the instantiated architecture for Solution 1. It is observablethatmostoftheprocessingisdoneinthetrusteddeployment.Almostallstorage,ontheother hand, is left under the responsibility of the cloud provider. In detail,responsibilitiesareassignedasfollows.

Page 18: Architectural and API proposal for the secure processing ... · privacy-preserving storage and computation1. Heavily rooted in the cloud computing paradigm, such architecture comprises

WP3–Privacypreservingstorageandcomputationarchitecture 18

Figure5:Solution1architecture

3.2.1 Trusteddeployment

In the trusted domain, the SafeCloud SQL engine will process full ANSI SQL withtransactional support and translate SQL queries into NoSQL statements. Thesestatementsarehandedovertoaproxycomponentthat isresponsible forapplyingtherequiredsecuritymechanismstothedata.

Astraightforwardsolutionregardingsecuritycould involvesimplyencryptingalldatawith standard cryptographic techniques, and therefore achieve a very strong level ofsecurity. However, this prevents the NoSQL database to be able to perform anymeaningful computation over the data, implying the unfeasible scenario inwhich anyquery over the encrypted information requires the proxy to retrieve the wholeencrypteddatabase,decryptitandprocessoverit.

One approach would be to apply cryptographic techniques only over sensitiveinformation, in a way that allows for the NoSQL database to perform the basiccomputationsrequired.Forinstance,theclientmightconsiderthevaluesofitsdatabase

Page 19: Architectural and API proposal for the secure processing ... · privacy-preserving storage and computation1. Heavily rooted in the cloud computing paradigm, such architecture comprises

WP3–Privacypreservingstorageandcomputationarchitecture 19

tobesensitive,butallowidentifierstobeleftintheclear.Obviouslythissimplescenarioisnotfeasibleinallcases,asthedatathattheuserconsiderstobenon-sensitivemightnotsufficeforourunderlyingNoSQL,e.g.identifierscanbeleftintheclear,butmostofthe required systemqueries aremadeover the sensitivedata.Oneway to extend theamountofnon-sensitive informationcouldbetoapplyanonymizationtechniquesoversomepartsof thesensitivedata.Assumingthattheclientconsidersthesemeasurestobesufficientforallowingpartsofitssensitivedatatoberevealedtothecloudprovider,thiswouldallowforagreaterportionofinformationtobestoredintheclear.

Alternatively,wecanalsoapplydifferentcryptographictechniquesoverdifferentlevelsof data sensitivity. As an example, consider order-preserving encryption (OPE)[BCL+09]. OPE is a deterministic encryption scheme whose encryption functionpreserves the numerical order of the plaintexts. In brief, if plaintextsm1 andm2 areencryptedintociphertextsc1andc2(respectively)withOPE,andm1>m2,thenc1>c2.Thismeans thatadatabaseholdingOPEciphertextscanefficiently (logarithmic in thesizeofthedatabase)performrangequeriesoveritsdata.TechniquessuchasOPErevealsome properties of the data they are encrypting, but conceal the actual values. Thistechnique is suitable when the client does not want to disclose its data but is notconcernedaboutdisclosingsomeofitsproperties(theorder,inourexample).

3.2.2 Untrusteddeployment

TheuntrusteddeploymentmainlyconsistsofaNoSQLdatabase,inparticularanHBasestore. Data stored in this HBase deployment can be accessed via the previouslydescribed put, get and scan interface. Note that in order to achieve the desiredperformance, the database system takes advantage of HBase processing capabilities.Namely,HBaseallowsquerying itsdataaccordingtoasetofcriteria that is translatedintodatafiltersrunningclosetothedataitself.Theabilitytoperformtheseprocessingsteps isdirectly impactedbythetypeofsecuritymechanismsdeployedbythetrustedproxy.Dependingonthesecurity techniqueused,moreor lessdataprocessingcanbedelegatedtotheHBaselayer.Thechosentechniquewill,naturally,dependnotonlyonthedesiredperformancebutalsoonthetrustmodelrequiredbytheapplicationanditsdata.

3.3 Discussion

Briefly, thesolutiondescribed inthissectionoffloadssomeof thedataprocessingandstorage done at the trusted domain to the untrusted domain by deploying a NoSQLdatabase in the remote site. Depending on the security mechanisms deployed theamount of processing done on the untrusted site varies. Considering that doing suchprocessingclosertothedataallowsforbetterperformance,itnaturallyfollowsthatthestrongerthesecurityrequired,thelessprocessingcanbedoneattheuntrustedsite,andthe lowest the performance is expected to be. By considering different cryptographictechniquesforthedifferentlevelsofsensitivedata,wearemaximizingourperformancepossibilities,whichcouldleadtosolutionsthatareastepforwardwithrespecttostateoftheartprivatedataprocessingsolutions.

Anapplicationexampleforthissolutionisalaboratorythatownsaprivateclusterwithsomecomputationalpower,butisinterestedinoffloadingmostofthestorageeffortintoanuntrustedclouddomain.Thelaboratorystoragesystemdealswithacertainamountofconfidentialinformationforeachpatient(asetofcolumnsandrespectivevalues,for

Page 20: Architectural and API proposal for the secure processing ... · privacy-preserving storage and computation1. Heavily rooted in the cloud computing paradigm, such architecture comprises

WP3–Privacypreservingstorageandcomputationarchitecture 20

instance), which is identified by a unique identifier (key). Publicly disclosing thisidentifierisnotanissuebecauseeitheritisalreadypublic,ortheinformationrequiredtodisclosetheactualpersonreferredbythatidentifierissafelystoredatthetrustedon-premises deployment. Considering this scenario, offloading the data to the untrusteddeployment demands for mechanisms that ensure the privacy of the patient data. Inother words, mechanisms that render all values to become opaque for the cloudprovider.ThiscanbeachievedbystoringdataintheNoSQLdatabaseusingkeysintheclear and values that completely hidden from the cloud provider using, for instance,standard cryptographic techniques. As a consequence, in this example, securitymechanismsarequiteinexpensiveand,becausedatakeysarestoredintheclear,alltheprocessingmechanisms ofHBase are preserved. In particular, the ability to scandataaccording to user defined criteria is preserved. Moreover, as HBase is deployed in acloudenvironment,thissolutionalsoallowstheusertotakeadvantageofitsscalabilityandelasticitypotential.Inmoredetail,byallowingdatakeystobestoredintheclearatthe untrusted NoSQL database, it is possible to retrieve specific keys and to performfilters over these keys in an efficient way and using the untrusted deploymentcomputationalpower.Additionally,sinceNoSQLallowsstoringthedataforeachpatientinadistinctcolumn,andsincethesecolumnsmaybegroupedinfamilies,itispossibletohave different encryption keys for each column family. This distinction allows us toestablishaccesscontroloveruserswithdifferentpermissionlevels(roles).Forinstance,intheclinicalanalysis laboratoryexample,supposethatapatienthasasetofanalysisthatcanonlybeseenbyitspersonaldoctor,butalsohasanothersetofdocumentsthatarevisibletoalltheclinicalstaff. Thesedocumentsmaybestoredindifferentcolumnfamiliesandwithdifferentencryptionkeys,sothatonekeyisavailabletoalltheclinicandallowsaccesstoonesetofdocuments,whiletheotherkeyisonlyavailableforthepatient’spersonaldoctor, thathas thepermission toaccess theotheranalysisset. Ontheotherhand, ifwe consider an identical examplewith theexception thatkeysnowmayleaksensitiveinformationandcannotbestoredintheclearattheNoSQLuntrusteddeployment, this solution also allows applying cryptographic techniques to the keys,such asOPE,while still preserving the computation capabilities ofHBase. To sumup,this solution will allow testing several cryptographic techniques in order to performsecureandefficientprocessingonasingleuntrustedNoSQLdeployment.Moreover,thesolutionwill leverage the processing capabilities available in a trusteddeployment toprovideafullANSISQLalongwithtransactionalsupporttoapplications.

ThemajordrawbackofSolution1isthedependenceonasinglecloudprovider(singleuntrusted deployment), which limits the security techniques that can be applied. InSolution2,SafeCloudaddressesthisissuebyconsideringthatdatawillnowbestoredinmultipleuntrusteddomains.

4 Solution2:Secureprocessinginmultipleuntrusteddomains

4.1 Introduction

The major difference from the previous solution is that this approach assumes theexistenceofmultipleuntrustednon-colludingdeployments.Leveragingthetrustmodelin this way allows for the usage of other security mechanisms incompatible withsolution 1, such as secret sharing schemes. Spreading data and computation acrossseveraldomainssothatasingledomaincannotcompromisetheprivacyofthedataheldbyitisoneofthecoreconceptsofSafeCloudthatwillbeinvestigatedinthissolution.As

Page 21: Architectural and API proposal for the secure processing ... · privacy-preserving storage and computation1. Heavily rooted in the cloud computing paradigm, such architecture comprises

WP3–Privacypreservingstorageandcomputationarchitecture 21

such, solution2 isbasedon the assumption that the trusteddeployment continues tohave scarce storage resources, while still having computational capabilities. Theuntrusteddeploymentscontinuetocomplementthetrusteddomainwiththenecessarystoragecapabilitiesandadditionalcomputationalcapabilities.Insummary,thissolutionaims at providing similar processing capabilities for the trusted and untrusteddeployments while reducing the information leakage that a single untrustedinfrastructurecouldexploit.

Theuntrusteddeploymentwillbecomposedofmore thanoneNoSQLdatabase,whileeach databasewill be deployed in different cloudproviders.Moreover, each databasewillonlycontainapartof theoriginal information,whichwillavoiddisclosingprivateinformationtotheuntrustedprovider.Inparticular,theinformationispartitionedinaway that prevents a single provider to retrieve anymeaningful information from thestoreddata.Additionally,specialsecuremultipartycomputationprotocolscanrunoverthis data, enabling for NoSQL operations to be performed in a secure fashion at theuntrusted deployments. This is key to support the desired functionality of delegatingcomputationtotheuntrusteddomain.

The architecture for this solution presents several challenges, both in terms ofdistributedsystemsandcryptography.Nonetheless,similarlytosolution1,theobjectiveis to offer both a SQL and NoSQL interface to the SafeCloud client applications. Toachievesuchgoal,theuntrusteddeploymentwillofferavirtualNoSQLinterfaceontopof threeHBasedatabases.Applicationsusingthis interfacewillbeabstractedfromtheunderlying data partitioning and from the number ofHBase replicas being used, thusseeingaregularsinglenodeHbaseAPI.OntopoftheNoSQLAPI,andbyrecurringtothetrusteddeploymentprocessingcapabilities,thissolutionwillbeabletoofferafullyANSISQL compliant database system with transactional support. Following the samestructure of the previous section, we now describe in more detail the design andarchitectureofsolution2.

4.2 ArchitectureandAPI

Figure6depictsthearchitectureforsolution2.Themajorarchitecturaldifferencefromsolution1 is thatnowthereareseveralHBase instanceshosted indifferentuntrustedinfrastructures,whilethetrustedproxyisnowresponsibleforcollectingandprocessingresponsesfromalltheseinstances.

Page 22: Architectural and API proposal for the secure processing ... · privacy-preserving storage and computation1. Heavily rooted in the cloud computing paradigm, such architecture comprises

WP3–Privacypreservingstorageandcomputationarchitecture 22

Figure6:Solution2architecture

4.2.1 Trusteddeployment

Thetrusteddeploymentremainsverysimilartotheonedescribedinsolution1,withtheSafeCloud SQL engine processing full ANSI SQL with transactional support andtranslatingSQLqueries intoNoSQLstatements.Thegreatest change in this layer is inthetrustedproxycomponent.Theproxymechanismisresponsiblefornotonlyapplyingthe required security mechanisms to the data but also to create a virtual andtransparentNoSQLinterfaceovertheNoSQLdatabaseinstances.

This architecture with several database instances responsible for performing securecomputation expands the range of cryptographic techniques that can be applied. Oneway to benefit from this would be to use secret sharing and secure multi-partycomputation protocols over the stored data. Loosely explaining, secret sharing is amethod for distributing a secret among several participants: each one will receive ashareofthissecret,eachsharedoesnotrevealanyinformationbyitself,butasufficientnumber of shares allow for the secret to be reconstructed [DPS+08]. In this specificimplementation, the proxy is responsible for generating and distributing the sharesassociated with sensitive data, as well as gathering and interpreting the shares itreceivesfromtheuntrusteddomains.

From the client perspective, we expect the usability experience (modulo theperformance toll thesemeasuresmay takeon the system) tobe the sameas if itwas

Page 23: Architectural and API proposal for the secure processing ... · privacy-preserving storage and computation1. Heavily rooted in the cloud computing paradigm, such architecture comprises

WP3–Privacypreservingstorageandcomputationarchitecture 23

interactingwitharegularSQL/NoSQLdatabase,dependingontheAPIprovided intheclientinterface.

4.2.2 Untrusteddeployment

OneoftheSafeCloudobjectivesistoexplorethepossibilitiesofrunningSQLorNoSQLqueries on data that is divided in several databases. Accordingly, in the solutionpresented in this section, theuntrusteddeploymentnowencompasses severalNoSQLdatabases,inthiscaseHBasestores,runningindifferentclouds.Again,wecannottrustanyofthecloudproviders,butweassumetheyarenotwillingtocollude. EachHBaseinstancewill be able toperformcomputationover storeddatawithoutdisclosing anymeaningfulinformationaboutthatdata.Forinstance,ifsecretsharingisbeingused,theHBaseinstanceswillbeabletotranslateNoSQLqueriesintoprotocolsthatareawareoftheunderlyingsecretsharingschemes.Thiswillalloweachinstancetoprocessdataandanswerthequerieswhilestillbeingunableto,individually,extractanyknowledgefromit.

Some computation is already possible in the distributed secret shared scenariowhenusingadditivesecretsharing[BNT+12].Thissecretsharingschemealreadyhasseveralsecure protocols that can compare values anddo arithmetic operations on shares. Toachievethis,thedatabasesgothroughseveralstepsofinformationsharing,whichallowthem to reach consensus about the stored data without ever knowing which are theoriginalsecrets.InSafeCloudtheaimistoextendthesetechniquestoHBasedatabasestoallowsecuredataprocessinginsuchsystems.

It is of special relevance that thesemechanisms canbe included in theHBase systemwithoutmodifying its core implementation. This can be achieved by using HBase co-processorsthat,forsimplicity,canbeseenaspluginsthatareimplementedandaddedtoHBaseinstances,andthatallowextendingthecomputationdonewhenNoSQLqueriesareperformed[APACHE12].

4.3 Discussion

Insummary,thesolutionpresentedinthissectionnolongerassumesthatdataisstoredin a single untrusted entity. This assumption allows increasing the provided securityguarantees and to employ new cryptographic techniques like secret sharingmechanisms.Inparticular,thesemulti-partycomputationalgorithmsoversensitivedatadon’trequireforsignificantpropertiesofthedatatoberevealed,aswasthecasewithorder-preservingencryption.

Ontheotherhand,secretsharingtechniqueswilladdextrastorageandcommunicationoverhead that will negatively impact the performance of solution 2, at least whencomparedwiththeperformanceachievablebysolution1.

Returning to the clinical laboratory example, this time the laboratory owns a privatecluster with limited computational power, and wants to offload storage andcomputationalpowertoseveralcloudproviders.Muchlikeinthepreviousscenario,thelaboratoryisnotwillingtodisclosepatientsensitiveinformation,butnowtheorderofthe sensitive information follows a predictable pattern, so the laboratory is notinterestedinorder-preservingsecuritymechanisms.Moreover,thelaboratorydoesnottrustasinglecloudprovidertoholdandprocessallthedata.However,itconsidersthe

Page 24: Architectural and API proposal for the secure processing ... · privacy-preserving storage and computation1. Heavily rooted in the cloud computing paradigm, such architecture comprises

WP3–Privacypreservingstorageandcomputationarchitecture 24

scenario of colluding competing cloud providers highly unlikely, so it is comfortablewithassumingitwillnotoccur.

By spreading the information in parts according to provably secure mechanisms ofsecretsharing,ithasassurancesthateachcloudproviderwillnotbeabletoretrievethesensitive information from what is stored within. Furthermore, the homomorphicpropertiesofthesesharesallowforcomputationoverthem,sothelaboratoryalsohasaccesstoprovablysecureprotocolsforprocessingoverthesecurelystoredshares,inawaythatthesecrecyofdataiskeptatalltimes.

Forabroaderscopeofreal-worldapplications,wemightbe interested inallowing forseveraluntrustedclients toqueryoverpotentially sensitivedata. In solutions1and2there is a single trusted domain that manages data in the clear, decides whatencryptions schemes to apply and that handles the translation of SQL toNoSQL. Thisdoes not consider the scenario in which several laboratories (with independentdatabasesholdingsensitivedata)wanttojointlycomputeovertheirdatasets.Thisisavery common use case when considering the relevance of subjects such as big data,where these laboratories could ascertain valuable statistical information from hugequantitiesofdata.Suchusecaseisexploredinsolution3presentednext.

5 Solution 3: Secure processing in multiple untrusted domains

withuntrustedclients

5.1 Introduction

In the first twosolutions it isassumed that there isonlyoneentity,whoownsall thedatastoredintheuntrusteddomain.Inthisthirdsolutionthisassumptionisdropped.Themotivationforgettingridofthisassumptioncomesfromtheneedtoanalyzedatafrommultiplesourceswhilekeepingitprivate.Awellknownexample,wheretherearemany data owners and shared analytics is needed, is the Yao'sMillionaires' problem[YAO+82].Inthissimplifiedcasetherearetwomillionaires,whowanttoknowwhoisricher.Theproblemisthattheydonotwanttorevealtheirnetworthtoanyone.Inthissimplifiedcase,wehavetwodataowners,themillionairesandtheneedforanalyticsontheshareddataofboth:whoisricher.Oneextrarequirementisthatthesystemshouldgive the answer to only authorized users, in this case themillionaires themselves. Athirdparty,forexamplethelocalnewspaper,shouldnotlearnwhichofthemillionairesisricher.

This means that even the authorized users should not be able to see all the data.Therefore,dataisprocessedintheuntrusteddomaininencryptedformonthebackendcomponents. Users cannotmake direct queries for data: only aggregation results arereturned.Itmightbethecasethatforrealworldapplicationsmorefine-grainedaccesspoliciesareneeded.Foranexamplecomputingminimumofsometablecolumnwouldrevealtoomuchandisnotallowed,butcomputingmeanofthesamecolumnisallowed.

Millionaires’problemisaclassicaltoyexample,buttherearemanymorecases,wheredifferent entities, who own sensitive data, could leverage the results of joint dataanalysis. Examples can be found where personal data is stored. For an example a

Page 25: Architectural and API proposal for the secure processing ... · privacy-preserving storage and computation1. Heavily rooted in the cloud computing paradigm, such architecture comprises

WP3–Privacypreservingstorageandcomputationarchitecture 25

hospitalhasmedical records,whicharehighlysensitive.However, to track largescaleviral outbreaks, different hospitals need some way to combine their data, whilepreservingtheirconfidentiality.

5.2 Architecture

ThearchitectureforSolution3isdepictedinFigure7.Inprevioussolutionsthetrusteddomainprocessingdidsomeactualprocessingof thedata. In thissolution the trusteddeployment is responsible for only encrypting the query parameters, decrypting theresults and orchestrating thework done in the untrusted domain. The front-end stilldealswithprocessing thequeryandexecuting iton thebackend,butunlike theothersolutions,onlythequerieddataissenttothetrusteddomain.Theonlydataprocessingthathappensinthetrusteddomainisencryption/decryptionofthedata.

In the sense of the general architecture in Section 2, there are no trusted domainprocessingorstorageinvolved.Theproxycomponentdoesencryption,decryptionandquerytranslationinadditiontocommunicatingwiththeuntrusteddomaincomponents.The solutionwill bebuilt on topof the Sharemind framework [SHAREMIND+08].TheSharemind framework is a programmable system for doing privacy-preservingcomputations. Although Sharemind supports different deployment schemes, themostevolvedprotocolsuiteisbasedonsecret-sharingbetweenthreeparties.Therefore,theuntrusted deployment consists of three Sharemind servers, each holding part of thesecret-shareddata.

Figure7:Solution3architecture.

It is required that three Sharemind Application Servers are hosted in differentadministrativedomains.Threedomainsjustneedtobedifferent,theyallcanbeintheuntrustedcloud.ThisistoavoidcollaborationbetweenentitiesthathosttheSharemindservers. Collaboration would break the confidentiality guarantees offered by secretsharing.

5.2.1 Proxycomponent

The proxy component provides a SQL interface to the secure processing system. Itparsesthequeriesandtranslatesthemtoaseriesofoperationsthatneedtobedoneonthe back-end. Some of these operations, like filtering, require input parameters. Thefront-endcomponentwillencrypttheseparameterswhenexecutingtheoperations.Theresultsoftheoperationsarestoredintemporarytables,whichstayintheback-end.Theproxy component only gets references to these tables. A special operation is used to

Page 26: Architectural and API proposal for the secure processing ... · privacy-preserving storage and computation1. Heavily rooted in the cloud computing paradigm, such architecture comprises

WP3–Privacypreservingstorageandcomputationarchitecture 26

return the final resultofaqueryback to theproxycomponent.TheproxycomponentwillthendecrypttheresultandoutputsittotheSQLinterface.

In Sharemind context theProxy component is just a specific application that uses theSharemind Controller library. The controller library has simple functions to runprograms on Sharemind deployment. The programs are written in a domain specificlanguage called SecreC. Programswritten in SecreChave special data types for secretshared data, and operations run on the datawill automatically use securemultipartycomputationprotocols.

5.2.2 Back-endComponent

Theback-endcomponent isasuiteof functionswritteninSecreC,whichrunontopofthe Sharemind Application Server. The suite has functions for basic operations likefilteringandgrouping tables.All theoperationsareexecutedusingsecuremulti-partycomputationprotocolsbasedonadditivesecret-sharingscheme.

SharemindApplicationServerhasadatabasecomponentwhichstoresencrypteddata.ItcurrentlyusesasimpleHDF5basedbackendasthestoragecomponent.HDF5isadatamodel, hierarchical file format and a library used mostly for scientific experiments[HDF5]. However, othermore interesting storage backends can also be implementedeasilythankstoSharemindApplicationServer’smodulardesign.

5.3 Discussion

The solutionwill not implement the full ANSI SQL standard, because someparts of itwouldbe inefficientordifficult to implementsecurely.Forsomeoperations likeEqui-joins we have efficient implementations. However, the ANSI SQL standard has someobscurepartsthatdonotfitintothesecureprocessingmodelverywell.

The performance of the solution will be highly dependent on the network latencybetween Sharemind servers. The latency and bandwidth between the proxy andSharemindserverswillnotbeasimportantfactor,asisthethelatencyandbandwidthbetweenSharemindservers.

This solution covers an important use case,where something has to be computed onconfidential data belonging tomultiple parties,who cannot trust each otherwith thedata.Sometimesthelegalaspectspreventgivingoutdatatootherparties.Othertimesbusiness secrets cannot be revealed to competitors. However useful statisticalinformation,thatallpartieswouldbenefitfrom,canbeobtainedfromshareddata.Thissolutionmakesitpossibletoobtainthebenefits,whilepreservingtheconfidentialityofdata.

Page 27: Architectural and API proposal for the secure processing ... · privacy-preserving storage and computation1. Heavily rooted in the cloud computing paradigm, such architecture comprises

WP3–Privacypreservingstorageandcomputationarchitecture 27

6 Conclusion

ThisfirstdeliverableofWP3aimsfortwomajorgoals.First,itpresentsanddetailstheunifyinggeneralarchitectureofSafeCloud.Second,itproposesthreemainsolutionsforprivacy-preservingstorageandcomputationbasedonthisarchitectureandfocusedonprovidingfeasibledeploymentstomeettherequirementsofSafeCloud’susecases.

Our proposed architecture is heavily rooted in the cloud computing paradigm,thereforemaking theexplicitdistinctionbetweenonpremisesinfrastructure andcloudinfrastructure. This led to a general solution architecture comprised of two maincomponents:thetrusteddeploymentandtheuntrusteddeployment,communicatingwiththe client via a client interface, and to each other via a proxy. By having these cleardistinctions,wecanspecify:

• TheAPI our solutionsprovide to the client, being either SQLorNoSQL - clientinterface.

• Thecomputationalrequirementsdelegatedtothelocalandremotecomponents-trusted/untrusteddeployment.

• Howtheclientsideistointeractwiththecloudprovider-proxy.

Our privacy-preserving solutions follow the described structure, and are derivedfromacomprehensiveanalysisofsecurityandperformancetrade-offs.SafeCloudavoidsthe problems inherent to a single-scenario focused approach by providing severalsolutions that meet different functional and security requirements. Our first solutionassumes a single untrusted cloud provider, and proposes the usage of differenttechniquesforvariouslevelsofdatasensitivity.Thesecondsolutionattemptstoextendthe functionalities of the first one by assuming a setting withmultiple non-colludinguntrusted cloud providers, employing security mechanisms that rely on this newdeployment to provide higher levels of security for storage and processing. Our thirdandfinalsolutionextendsthescopeofthesecondbyconsideringmultipleclientswithsensitivedata,whichallowsforcomputationsrelyingonhugeamountsofdata,suchasdataanalytics,toberunoverseveraldatasetsofdatawithoutrevealingsensitivedatatothe party requesting the computation (other than what would be revealed by thecomputationresultitself).

To sumup, thisworkpackage aims to gobeyond the state of the art to provide arangeofsolutionsthatfitapplicationswithdifferentrequirementsintermsofsecurityand efficiency, while allowing secure and private data processing in untrustedenvironments.

Page 28: Architectural and API proposal for the secure processing ... · privacy-preserving storage and computation1. Heavily rooted in the cloud computing paradigm, such architecture comprises

WP3–Privacypreservingstorageandcomputationarchitecture 28

7 References

[MSL+11] Martin Mulazzani, Sebastian Schrittwieser, Manuel Leithner, and MarkusHuber(2011)."DarkCloudsontheHorizon:UsingCloudStorageasAttackVectorandOnlineSlackSpace".USENIXSecuritySymposium.

[AFG+10] Michael Armbrust, Armando Fox, Rean Griffith, Anthony D. Joseph, RandyKatz,AndyKonwinski,GunhoLee,DavidPatterson,ArielRabkin, IonStoica,andMateiZaharia(2010).Aviewofcloudcomputing.Commun.ACM53.

[Dropbox12]Dropbox(2012). ”Dropboxhasbecome"problemchild"ofcloudsecurity…”,(http://venturebeat.com/2012/08/01/dropbox-has-become-problem-child-of-cloud-security/)

[CK08] Octavian Catrina, and Florian Kerschbaum (2008). "Fostering the uptake ofsecure multiparty computation in e-commerce”. Third International Conference onAvailability,ReliabilityandSecurity.

[L13]SusanLandau(2013)."MakingsensefromSnowden:What'ssignificantintheNSAsurveillancerevelations”.IEEESecurity&Privacy.

[D+14]ZakirDurumericetal(2014)."Thematterofheartbleed”.ConferenceonInternetMeasurementConference.

[LCW93]HongjunLu,HockChuanChan,andKwokKeeWei.1993.“AsurveyonusageofSQL”.SIGMODRec.22.[HHL+11] Jing Han, E. Haihong, Guan Le, and Jian Du (2011). "Survey on NoSQLdatabase”.Pervasivecomputingandapplications.

[ANSI16]AmericanNationalStandardsInstitute.ANSIWebpage.(http://www.ansi.org)[DD+93]ChrisJ.Date,andHughDarwen(1993).“AguidetotheSQLStandard:auser'sguidetothestandardrelationallanguageSQL”.Vol.55822.Addison-WesleyLongman.[APACHE16] Apache HBase Team (2016). “Apache HBase ™ Reference Guide”.(https://hbase.apache.org/book.html)

[BCL+09]AlexandraBoldyreva,NathanChenette,YounhoLeeandAdamO’Neill(2009)."Order-preservingsymmetricencryption”.AdvancesinCryptology-EUROCRYPT.

[DPS+08]MahirDoganay,ThomasPedersen,YücelSaygın,ErkaySavasandAlbertLevi(2008). "Distributed privacy preserving k-means clustering with additive secretsharing."InternationalworkshoponPrivacyandanonymityininformationsociety.

[APACHE12] Apache HBase Team (2012). “Coprocessor Introduction”.(https://blogs.apache.org/hbase/entry/coprocessor_introduction)

[CNIMBO09] CumuloNimbo european project partially funded by the EuropeanCommissionunderthe7thEuropeanFrameworkcontractnumberFP7-257993(2009).(http://www.cumulonimbo.eu/node/20)

[CPAAS14] CoherentPaaS european project funded by the European Union's SeventhFramework Programme for research, technological development and demonstrationundergrantagreementno611068(2014).(http://coherentpaas.eu)

[YAO+82] Yao, Andrew C. "Protocols for secure computations." Foundations ofComputerScience,1982.SFCS'08.23rdAnnualSymposiumon.IEEE,1982.

Page 29: Architectural and API proposal for the secure processing ... · privacy-preserving storage and computation1. Heavily rooted in the cloud computing paradigm, such architecture comprises

WP3–Privacypreservingstorageandcomputationarchitecture 29

[BNT+12] Dan Bogdanov, Margus Niitsoo, Tomas Toft, Jan Willeson (2012). “High-performancesecuremulti-partycomputationfordataminingapplication”.InternationalJournalofInformationSecurity.[SHAREMIND+08]DanBogdanov, SvenLaur, JanWillemson. Sharemind: a frameworkforfastprivacy-preservingcomputations.InProceedingsof13thEuropeanSymposiumon Research in Computer Security, ESORICS 2008, LNCS, vol. 5283, pp. 192-206.Springer,Heidelberg.2008.

[HDF5]HDF5homepage.(https://www.hdfgroup.org/HDF5/),2016