securing the cloud with client-side encryption 20190226 · 2019-02-26 · data in order to work...

12
IONIC SECURITY INC. | 1170 PEACHTREE STREET NE, SUITE 400 ATLANTA, GA 30309 | ionic.com COPYRIGHT 2019 IONIC SECURITY INC. ALL RIGHTS RESERVED Securing the Cloud with Client-Side Encryption: Benefits and Challenges PROFESSOR DAVID CASH, UNIVERSITY OF CHICAGO

Upload: others

Post on 30-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Securing the Cloud with Client-Side Encryption 20190226 · 2019-02-26 · data in order to work properly, and client-side encryption gets in the way. An important instance of this

IONIC SECURITY INC. | 1170 PEACHTREE STREET NE, SUITE 400 ATLANTA, GA 30309 | ionic.com

COPYRIGHT 2019 IONIC SECURITY INC. ALL RIGHTS RESERVED

Securing the Cloud with Client-Side Encryption: Benefits and Challenges

PROFESSOR DAVID CASH, UNIVERSITY OF CHICAGO

Page 2: Securing the Cloud with Client-Side Encryption 20190226 · 2019-02-26 · data in order to work properly, and client-side encryption gets in the way. An important instance of this

COPYRIGHT 2019 IONIC SECURITY INC.

2

SECURING THE CLOUD WITH CLIENT-SIDE ENCRYPTION: BENEFITS AND CHALLENGES

Outsourcing data to the cloud is often cheaper, more dependable, and more convenient than in-house solutions, but it also comes with risk. Services storing sensitive data (SSNs, biometrics, customer data, intellectual property) are targets for hackers, company insiders, and subpoenas by foreign governments. Standard encryption (e.g. TLS, disk encryption, database encryption) protects against some threats but is not effective in the face of total cloud compromise, when an attacker accesses decryption keys stored by the service. An emerging and more secure alternative is client-side encryption, where client data is encrypted under a key that is not given to cloud servers. When client-side encryption is enabled, even a powerful attacker or insider can extract only ciphertexts from servers, and not decryption keys. Client data remains confidential as long as individual endpoints are not compromised. Despite its benefits, several challenges impede the deployment of client-side encryption and it is uncommon today outside of simple services like cloud backup. Denying servers access to client data impacts some business models, and it also means that clients are responsible for key management, as the service can’t recover data when a client loses its keys. However, a more crucial issue is simply that services expect access to plaintext client data in order to work properly, and client-side encryption gets in the way. An important instance of this problem is server-side searching. Almost all cloud applications today provide a search interface that finds information quickly in response to queries from clients who need to navigate their large data stores. A user of, say, Box, Dropbox Business, Salesforce® or SharePoint CRM, or a similar service expects to immediately jump to a relevant document by typing some keywords into a search box. The reason client-side encryption breaks searching is simple: The service cannot find documents matching a query if it can’t decrypt them. Existing client-side encryption services like SpiderOak and Sync notably do not provide server-side searching, presumably for this reason, which diminishes usability for large file stores that are not mirrored on the client. This document discusses practical techniques for “squaring the circle” to enable server-side search on encrypted data without the pitfalls of other proposed partial solutions. We call this the Ionic Encrypted Search (or “ionES™”). Several solutions explored in other research papers offer subtle tradeoffs between security, search quality, performance, and implementation complexity. Here we will cover the most practical of these approaches, due to Ionic Security, which combines strong security with true practicality. All of the constructions we consider are built using standard tools (e.g. AES and HMAC), avoiding new or exotic primitives, and allow the documents themselves to be encrypted under any preferred standard approach. The constructions are transparent and usable for clients, with expressive search interfaces and capabilities to batch-ingest data, merge indexes, and more, as one would with plaintext data. Moreover, deployment requires the client to only own a

Page 3: Securing the Cloud with Client-Side Encryption 20190226 · 2019-02-26 · data in order to work properly, and client-side encryption gets in the way. An important instance of this

Securing the Cloud with Client-Side Encryption Benefits and Challenges

COPYRIGHT 2019 IONIC SECURITY INC.

3

lightweight and stateless service, while all of the heavy-lifting work is deferred to untrusted external servers. We mention in passing that these techniques are distinct from heavier-weight approaches like homomorphic encryption, private information retrieval (PIR), and oblivious RAM (ORAM). While all of these technologies can in principle be used to navigate encrypted data, our solutions will focus on custom-built tools for Ionic Encrypted Search that are much simpler, generically useful, and more deployable in the near term. Further, if those heavier-weight approaches become more practical, or new approaches are discovered, this framework will be able to incorporate them without having to start from scratch.

BASICS OF ENCRYPTED SEARCH

Encrypted search is accomplished via a family of techniques that can be adapted depending on functionality, security, and performance requirements. We start with a overview of the setting for simple encrypted search techniques as well as some of the high-level configurations that are possible. Threat model. The technologies in this document target confidentiality and authenticity of client data, even in the face of full cloud service compromise. An attacker inside a compromised service is assumed to see everything that is uploaded from clients and also to fully control servers. Such an attacker may attempt to extract document contents or information about client behavior. In particular, an attacker may want to learn the search queries themselves (i.e. that a client just searched for “payroll smith january”). Thus, encrypted search solutions aim to keep both document contents and queries private from a fully-compromised service. Integration. For the client, it is simplest if the cloud provider integrates encrypted search directly into a client-side encrypted product. In this architecture, the client infrastructure retains and manages decryption keys for cloud-stored files, and the cloud service provides interfaces for creating, editing, and searching encrypted documents. These interfaces will in fact be cryptographic protocols that work without revealing plaintext to service, and during document uploading/editing some encrypted helper information will be sent to the cloud to enable later secure searches. Since search queries themselves are considered sensitive, the searching protocol must hide the query contents like keywords, and the challenge is for the search to return the correct ciphertexts without telling the server what was queried. A version of this architecture, and associated flows for a search, is diagramed in Figure 1. The client infrastructure runs a lightweight server (“OPRF server”) and plugins on endpoint devices, while the cloud service holds encrypted documents and encrypted helper information. A client endpoint performs an encrypted search by first obtaining a secured search query from the enterprise’s OPRF server, and it then interacts with the cloud service using the secure search query to obtain references to encrypted documents.

Page 4: Securing the Cloud with Client-Side Encryption 20190226 · 2019-02-26 · data in order to work properly, and client-side encryption gets in the way. An important instance of this

Securing the Cloud with Client-Side Encryption Benefits and Challenges

COPYRIGHT 2019 IONIC SECURITY INC.

4

An alternative approach is to add client-side encryption and encrypted search to an existing cloud product via a third-party service, which is also untrusted and considered compromised. In this setting the cloud service (e.g., Dropbox) is unmodified and does not need to even know that encryption is being used. Documents are encrypted at the client, and the cloud service only sees ciphertexts, while the third party only holds encrypted helper information. Adding encryption will break the editing and search interfaces the cloud service provides. Instead, the client uses the third party to enable document editing and searching. Whenever the client wants to change a document, it interacts with the cloud service to change the document ciphertext, and also with the third party to update any encrypted helper information. To run a search query, the client contacts the third party to run a cryptographic protocol to learn which files match the query, and finally returns to the service to retrieve the actual ciphertexts. In this setting, it is important that the third-party service is not trusted with document contents. As we detail below, it will only be trusted with encrypted helper information that allows for query processing with minimal information about the query and plaintexts. It is also possible that the third-party service could instead be hosted in client infrastructure, and thus it could be more trusted. This may sometimes be appropriate, but burdens the client with administering an additional, complicated on-premise service that holds a large data store and stands in the critical path for each usage of the cloud service. Usability and performance. For an end-user, client-side encryption and encrypted search should be totally invisible. Either by service integration or third-party protocols in the background, the search box should work as expected, preserving most existing workflows.

Figure1:ArchitectureSupportingPracticallyDeployableIonicEncryptedSearch

Page 5: Securing the Cloud with Client-Side Encryption 20190226 · 2019-02-26 · data in order to work properly, and client-side encryption gets in the way. An important instance of this

Securing the Cloud with Client-Side Encryption Benefits and Challenges

COPYRIGHT 2019 IONIC SECURITY INC.

5

In contrast to theoretical (and heavy-weight) technologies like homomorphic encryption, the methods of Ionic Encrypted Search we discuss here only incur a modest compute and network overhead compared to plaintext search. In particular, the actual document encryption can be encrypted using any standard method, and the encrypted search will perform similarly to plaintext search, using indexes to avoid linear scans of all the documents. The solutions we consider in this document are all constructed using only standard cryptographic tools, and the actual encryption and decryption operations should introduce no significant latency. In a service that natively integrates Ionic Encrypted Search, documents can be found with no noticeable delay. When the capability is added via an additional third party, there is an extra round of communication during the search which will be fast in the common case, but adds network latency and an extra potential point of failure. In both cases, the storage burden on the service will be higher because encryption prevents server-side compression and deduplication with current technologies. This may mean the encrypted documents consume more storage at the service but should not affect the user experience. In addition to speed, one should also consider the type of queries supported. Modern text search systems provide a variety of features like query results ranking, substring and phrase searching, and sometimes use similarity analysis to improve the results. Encrypted search systems may support most, but not all of these features -- depending on how they are configured. Security. Perfect security in our very strong threat model is essentially impossible, in the sense that a cloud service will always learn something about client data. Depending on how document encryption is implemented, the host may learn the number of documents and their approximate lengths. This information should be harmless in most settings but should be noted when evaluating security. Encrypted search implementations all “leak” some statistical information about the underlying documents when a search is processed. As an example, consider a user who queries for a keyword (e.g. “payroll”) with an encrypted search system. While processing this search, the service will ship some document ciphertexts (or references thereto) back to the user. By design, the query plaintext payroll and document contents are hidden, but the service does learn information about which documents are returned. If the user later queries for a different keyword (e.g. “vacation”), then in most implementations the service will again see which documents are returned. It is conceivable that such statistics may occasionally lead to a query keyword being guessed. Thus, it is important that encrypted search systems be aware of this leakage and consider implementing countermeasures, such as dummy documents, to make statistical analysis more difficult. Additional features. Client-side encryption and encrypted search enable fine-grained control of client data beyond just confidentiality from the cloud. An enterprise client can enforce access control for its employees via service provider independent key management, without

Page 6: Securing the Cloud with Client-Side Encryption 20190226 · 2019-02-26 · data in order to work properly, and client-side encryption gets in the way. An important instance of this

Securing the Cloud with Client-Side Encryption Benefits and Challenges

COPYRIGHT 2019 IONIC SECURITY INC.

6

depending on the service. It is possible to restrict search queries issued by employees according to a policy enforced by restricting access to generate secured search queries. We will also discuss below a method for even mitigating threats to compromise the servers within an enterprise client’s architecture. In this setting, the client will typically run a key server that participates in protocols between endpoint devices, the cloud service, and the third party. Compromising the key server means access to the encrypted documents, but a carefully designed protocol can still protect information at the endpoint, including query plaintexts.

HOW ENCRYPTED SEARCH WORKS: BASIC TECHNIQUES

First, a flawed straw man. How can one build a service with client-side-encryption that still allows searching? Let us start with some straw man solutions where one naively adds client-side encryption to, say, an existing file backup service. This could be accomplished via a proxy that encrypts client-data before shipping it to the service. With this form of client-side encryption, the cloud service search interface will be completely broken because the ciphertexts will be meaningless. Instead, search could trivially be performed by mirroring (or re-downloading) the data on premises at the client, where the search mechanism can be re-implemented and spliced into the client interface. A slightly better approach could arrange to store only indexing information at the client rather than all the data, but for text data, an index may not be much smaller than the data itself. In either case, however, the cost to purchase and maintain a client-side mirror/index (which must comply with relevant regulations) will likely outweigh the benefit of using the cloud service, since the data is still stored at the client. Indexes and how to securely outsource them. We would instead like to leverage the benefits of cloud services and client-side encryption without the impracticality of a heavy client-side implementation. The high-level idea is to first encrypt the documents (or files, records, etc.) using standard strong encryption (fitting compliance and functionality requirements as needed), and store them at the service. Second, on the client-side, we perform a lightweight operation of creating an encrypted index that is also held by the service (or an additional untrusted third party). The encrypted index will be designed to protect that data while allowing for search when permitted by the customer.

Page 7: Securing the Cloud with Client-Side Encryption 20190226 · 2019-02-26 · data in order to work properly, and client-side encryption gets in the way. An important instance of this

Securing the Cloud with Client-Side Encryption Benefits and Challenges

COPYRIGHT 2019 IONIC SECURITY INC.

7

Keyword Document IDs Keyword Encrypted Document IDs

dog 2,4,5,6,7,9,10 f1115d3a e8ea75670b505cbe

cat 1,3,5,9 6947cd14 033777b4

squirrel 1,5,7,8,9 7694e4b2 b2833e3e04c

bird 2,3,5 a443576e ec0a1f

Figure2:Astandardindexforkeywordsearch(ontheleft),andanencryptedversion(ontheright).

In Figure 2, the index on the left is an example of the standard technique for enabling fast keyword search (and more). In response to a keyword query for “dog”, one uses the table to immediately recall which documents (via their identifiers, or pointers) contain that keyword. It is tempting to simply upload this index along with the encrypted documents. However, this exposes a lot of information to the service -- which can piece together documents as “unordered sets of words” using the index -- and allows for an extensive analysis of the stored contents. Thus, we need to hide the indexing information, which is what an encrypted index is designed to do. Basic table-based encrypted indexes. Multiple approaches are possible for building an encrypted index. We start with the simplest, which maintains the index row structure, and is visualized on the right of Figure 2. In this version, the keywords themselves are randomized (via a keyed hash), and the lists of matching documents are encrypted under keys that are known only to the client. The encrypted version on the right side visualizes what the untrusted server holds. Namely, hashing (with a key) hides the keywords, and per-row encryption hides the document identifiers. On its own, the encrypted index doesn’t allow a server to search encrypted documents. When a client wants to search for “dog”, it recomputes the keyed hash of “dog”, and then sends the hash along with a decryption key for that row to the server. Together, these values form a “secure search query”. Upon receiving the secure search query, the service can then look up the corresponding row using the hash, decrypt the corresponding document identifiers using the key, and retrieve the encrypted documents for the client. Basic table-based encrypted indexes provide fast searching and are simple to implement, but come with some subtleties. An adversary who captures the index can learn more information about the documents than we’d ideally like. Also, updating the index securely may be challenging, as a newly added document will incur writes on exactly the rows that correspond to its keywords.

Page 8: Securing the Cloud with Client-Side Encryption 20190226 · 2019-02-26 · data in order to work properly, and client-side encryption gets in the way. An important instance of this

Securing the Cloud with Client-Side Encryption Benefits and Challenges

COPYRIGHT 2019 IONIC SECURITY INC.

8

TWO NEW TYPES OF ENCRYPTED INDEXES: HIIT AND FIG

Advanced table-based encrypted indexes. More secure, and still fast, approaches adapt the table-based method to mingle the index rows into randomly jumbled lists that are linked with encrypted pointers. A specific combination that we call a Hash and Inverted Index Table, or HiiT, does just this via a combination of data structures that allows for fast, flexible searching while also limiting potential information leakage. In addition to storage optimizations, a HiiT is also built to support a sort of general “oblivious navigation” framework for the server, which allows for the embedding of a variety of standard data structures (e.g. lists, tree, skip lists) that can be selectively opened for an untrusted server. Using this framework one can customize the supported search features for different settings. Basic filter-based encrypted indexes. The table-based idea results in fast search that is sublinear in the number of documents. However, it can require a lot of space to store the index compared to a standard index. The primary difference is that compression is no longer effective for a table-based encrypted index, while standard indexes built with industry-grade tools like Elasticsearch or Solr will compress tables to a fraction of their original size. In a table-based encrypted index we can’t apply compression to the entire table, because each row must be decryptable on its own. There is another, more space-efficient way to build encrypted indexes that uses a common data structure called a set-membership filter (often simply called a filter) which succinctly represents a set of objects and supports queries of the form “Does x belong to the set?” By exploiting the tradeoffs carefully, set-membership filters can represent a large set using a very small amount of space. A straightforward way to support search without encryption involves creating a set-membership filter for each file, where the set represented is simply the set of keywords in the file. To search for a keyword, one tests each filter to see if the keyword is a member of the set. This takes linear time, as each document must be tested individually, but saves on space because the filters are small. A basic encrypted version of filter-based searching is not much more complicated. Instead of representing the set of keywords in a document (e.g. {dog, cat, squirrel}), one replaces the keywords with keyed hashes (e.g. {f1115d3a, 6947cd14, 7694e4b2}). The client retains, and keeps private, the key used to produce the hashes. A search for dog is very simply replaced by a search for f1115d3a, which can be computed by the client who knows the hashing key. Advanced filter-based encrypted indexes. The technique above saves space by using filters, but pays for it with slower searching. Instead we’d like the best of both worlds, or at least something close in terms of small storage and fast computation during searches. Fortunately, an effective trade-off is possible via a new tree structure that we call a Filter-Gradient tree, or FiG tree. The goal of a FiG tree is to succinctly represent the sets of

Page 9: Securing the Cloud with Client-Side Encryption 20190226 · 2019-02-26 · data in order to work properly, and client-side encryption gets in the way. An important instance of this

Securing the Cloud with Client-Side Encryption Benefits and Challenges

COPYRIGHT 2019 IONIC SECURITY INC.

9

keywords in documents while enabling sublinear search. At a high-level, FiG trees work by placing the per-document filters on the leaves of a tree data-structure, and the computing helper filters on the other nodes of the tree. The helper filter at each node represents the union of the sets in children nodes. With this recursive construction, the root node is then representing all keywords in the document corpus. Searching now starts at the root, and whenever a keyword is found in the current set, recurses to the child nodes until it finds the leaves corresponding to matching documents.

FiG trees require slightly more storage than basic filter-based search due to the internal nodes, but can be carefully parameterized to be competitive with basic filter-based indexes. FiG trees beat the linear-time searching because each result requires (at most) a logarithmic number of lookups while walking down the tree. This is slightly slower than the constant-time look-ups of table-based indexes, but still exponentially faster than a linear-time search. Additional protection for client-side compromise. Client machines must have access to the keys used for generating secured queries and the document keys used to encrypt their files, so we have thus far considered the client infrastructure to be trusted and secure. Nonetheless, an attacker may target client machines within a large organization to obtain keys, and systems should be designed to limit the damage of such attacks. Distributing trust amongst pieces of client infrastructure allows some information to remain private when some endpoints are compromised. In the Ionic solution, if an end-user laptop is captured, an attacker will not obtain the key for generating queries unless it also compromises the client OPRF service. On the other hand, a compromise of the OPRF service does not expose document encryption keys. Standard security policies and key management systems such as the Ionic Platform can be used to restrict access to individual document keys and further limit the damage caused by machine compromises. By using more advanced techniques from secure multiparty computation, we can maintain confidentiality of queries even when the client keyserver is compromised. In a naive implementation, a client endpoint will submit its plaintext query to the OPRF service and then receive a response that it can transform to a secured query. Using a tool called an

Figure3:Anexampleofa(padded)FiGtree

Page 10: Securing the Cloud with Client-Side Encryption 20190226 · 2019-02-26 · data in order to work properly, and client-side encryption gets in the way. An important instance of this

Securing the Cloud with Client-Side Encryption Benefits and Challenges

COPYRIGHT 2019 IONIC SECURITY INC.

10

Oblivious PRF, the service can provide the secured query without ever learning the plaintext of the query (so an attacker in the service won’t learn it either). The details of the protocol are beyond the scope of this document, but the protocol itself can be built using standard public-key cryptography libraries and introduces only a small computational overhead compared to the less secure option.1

1Seehttps://eprint.iacr.org/2017/111fordetails.

Page 11: Securing the Cloud with Client-Side Encryption 20190226 · 2019-02-26 · data in order to work properly, and client-side encryption gets in the way. An important instance of this

Securing the Cloud with Client-Side Encryption Benefits and Challenges

COPYRIGHT 2019 IONIC SECURITY INC.

11

SUMMARY

Client-side encryption greatly reduces the need to trust the cloud by protecting data from malicious administrators, external hackers, and hostile governments. Technologies like encrypted indexes provide rich search interfaces to navigate encrypted data, meaning clients can enjoy the security provided by a layer of encryption without changing how they use the cloud service. Crucially, Ionic Encrypted Search or ionES™ via the methodologies introduced in this paper is efficient and practical enough to be used today, unlike more theoretical technologies. Ionic Encrypted Search can be built into a variety of architectures with different data structures to support varying performance and search functionality requirements. Cloud providers can incorporate Ionic Encrypted Search directly into their products, or clients can even add the layer to existing services themselves while only administering a lightweight, stateless helper server. Academic and industry research has made progress in finding the best approach for designing encrypted search, and now a variety of tools are available to be adapted and optimized for settings with varying security and functionality requirements. By combining well-studied security and easy deployability, Ionic Encrypted Search is a major step towards enabling client-side encryption and building a more secure cloud.

Figure4:ExampleUsageofIonicEncryptedSearch,with2CompaniesLeveragingMultipleDeploymentOptions

Page 12: Securing the Cloud with Client-Side Encryption 20190226 · 2019-02-26 · data in order to work properly, and client-side encryption gets in the way. An important instance of this

Securing the Cloud with Client-Side Encryption Benefits and Challenges

COPYRIGHT 2019 IONIC SECURITY INC.

12

NOTICES

Ionic Securityâ, the Ionic Security logo, ionES™ and Ionic.com are trademarks or registered

trademarks of Ionic Security Inc. or its affiliates in the U.S. and other countries.

The products and services described in this white paper overview are distributed under separate subscription and license agreements restricting their use, copying, distribution, and decompilation or reverse engineering. This document grants no rights or licenses to any recipient, and no part of

this document may be reproduced or distributed in any form by any means without prior written consent of Ionic Security.

This document is for information purposes only and the information is provided “as is”. The information contained in this document is subject to change without notice.