accumulo summit 2015: attempting to answer unanswerable questions: key management in accumulo for...
TRANSCRIPT
1© Cloudera, Inc. All rights reserved.
Key Management in Accumulofor Encryption at RestAnthony Young-Garner
2© Cloudera, Inc. All rights reserved.
Past and future threats, a refresher
Accumulo Cluster HDFS ClusterClient Machines
Zookeeper Cluster
Sven(User)
Jo(User)
Molly(Network Admin)
William(Accumulo Admin)
Halim(HDFS Admin)
Trusted Zone (implicit)
IMAGE DESIGN CREDIT: MICHAEL ALLEN, SEE SLIDE 7
3© Cloudera, Inc. All rights reserved.
This is not theoretical (1 of 3)
[accumulo@secure-2 lib]# accumulo shell -u rootPassword: **************
Shell - Apache Accumulo Interactive Shell
-
- version: 1.6.0-cdh5.1.4
- instance name: accumulo
- instance id: cce72c83-826a-41bf-a11f-a8aecebeebaf
-
- type 'help' for a list of available commands
-
root@accumulo table1> scan -s public,private
alice properties:age [public] 48
alice properties:ssn [private] 123-45-6789
bob properties:age [public] 51
bob properties:ssn [private] 231-32-6789root@accumulo table1> quit
Accumulo user with proper visibility authorizations
accessing data.
4© Cloudera, Inc. All rights reserved.
This is not theoretical (2 of 3)
[hdfs@secure-2 ~]$ hadoop distcp \
hdfs://secure-1:8020/accumulo/tables/3/default_tablet/F000018w.rf \
hdfs:// insecure-1:8020/tmp/table1_export_dest/
HDFS administrator copying RFile from a cluster on which he has no privileges to one on
which he does.
5© Cloudera, Inc. All rights reserved.
This is not theoretical (3 of 3)
[root@insecure-5 ~]# accumulo shell -u root
Password: **************
Shell - Apache Accumulo Interactive Shell
-
- version: 1.6.0-cdh5.1.4
- instance name: accumulo
- instance id: ebfe2e64-ba12-4231-8261-3a89115046ed
-
- type 'help' for a list of available commands
-
root@accumulo> importtable table1_copy /tmp/table1_export_dest
root@accumulo table1> setauths -u root -s public,private
root@accumulo> scan -t table1_copy -s public,private
alice properties:age [public] 48
alice properties:ssn [private] 123-45-6789
bob properties:age [public] 51
bob properties:ssn [private] 231-32-6789
root@accumulo> quit
HDFS admin reading unauthorized data.
6© Cloudera, Inc. All rights reserved.
Past and future threats, where we left off
Accumulo Cluster HDFS ClusterClient Machines
Zookeeper Cluster
Sven(User)
Jo(User)
Molly(Network Admin)
William(Accumulo Admin)
Halim(HDFS Admin)
Trusted Zone (implicit)
IMAGE DESIGN CREDIT: MICHAEL ALLEN, SEE SLIDE 7
7© Cloudera, Inc. All rights reserved.
Accumulo SecretKeyEncryptionStrategy
• Accumulo encryption at rest encrypts each RFile and WAL file with a data encryption key (DEK)
• Data encryption keys are encrypted with a key encryption key (KEK)
• Data is secure at rest and in transit
• Key encryption key is stored in HDFS (default implementation)
See Michael Allen's "Past and Future Threats: Encryption and Security in Accumulo"
presentation from Accumulo Summit 2014 for more detail on message encryption (SSL)
and data encryption support in Accumulo 1.6
http://accumulosummit.com/archives/2014/program/talks/
8© Cloudera, Inc. All rights reserved.
Enabling Accumulo encryption - accumulo-site.xml
9© Cloudera, Inc. All rights reserved.
Access attempt thwarted
[root@insecure-5 ~]# accumulo shell -u root
root@accumulo> importtable table1_copy /tmp/table1_export_dest
root@accumulo> scan -t table1_copy -s public,private
2015-04-18 22:45:15,282 [shell.Shell] ERROR: java.lang.RuntimeException:
org.apache.accumulo.core.client.impl.AccumuloServerException:
Error on server insecure-5.vpc.cloudera.com:10011
root@accumulo> quit
HDFS admin attempt to read unauthorized data fails.
10© Cloudera, Inc. All rights reserved.
Data access threats summarized
Vector Protection mechanism
Unauthorized users Visibility labels
Network administrator Thrift/SSL
HDFS administrator Accumulo encryption at rest
Misconfiguration All of the above
11© Cloudera, Inc. All rights reserved.
Not so fast!
hdfs@secure-2 ~]$ hadoop distcp \
hdfs://secure-1:8020/accumulo/crypto/secret/keyEncryptionKey \
hdfs:// insecure-4:8020/accumulo/accumulo/crypto/secret/keyEncryptionKey
HDFS admin can copy the Accumulo key encryption key!
12© Cloudera, Inc. All rights reserved.
Current threats: nearly back where we started?!
Accumulo Cluster HDFS ClusterClient Machines
Zookeeper Cluster
Sven(User)
Jo(User)
Molly(Network Admin)
William(Accumulo Admin)
Halim(HDFS Admin)
Trusted Zone (implicit)
13© Cloudera, Inc. All rights reserved.
An interlude: HDFS transparent encryption at rest
• Data in encryption zones is transparently encrypted by HDFS client
• Secure at rest and in transit
• Prevents attacks at HDFS, FS and OS levels
• Key management is independent of HDFS
• Designed for performance, scalability, compartmentalization and compatibility
• Keys are stored by Hadoop Key Management Service (KMS)
• Proxy between key store and HDFS encryption subsystems on HDFS client/server
14© Cloudera, Inc. All rights reserved.
HDFS encryption, simple version
HDFSclient
HDFS Cluster
HDFSData Node
HDFSName Node
REST/HTTP
Hadoop Key Provider API
1. User or process creates key (KEK)
HDFS FileHDFS File
HDFS FileHDFS File
File metadataFile metadata
File metadata
2. HDFS admin creates encryption zone. Associates empty directory and a KEK.
3. User or process initiates read/write to file in encryption zone
Hadoop KMS
15© Cloudera, Inc. All rights reserved.
HDFS encryption, name node actions
HDFSclient
HDFSData Node
HDFSName Node
Hadoop KMS
REST/HTTP
Hadoop Key Provider API
HDFS FileHDFS File
HDFS FileHDFS File
File metadataFile metadata
File metadata
3. User or process initiates read/write to file in encryption zone
5. Name node returns file stream and encrypted key to client 4. On file creation, name
node requests encrypted data encryption key (EDEK) from KMS. EDEK is stored with file metadata on Name Node.
HDFS Cluster
16© Cloudera, Inc. All rights reserved.
HDFS encryption, client actions
HDFSclient
HDFSData Node
HDFSName Node
REST/HTTP
Hadoop Key Provider API
HDFS FileHDFS File
HDFS FileHDFS File
File metadataFile metadata
File metadata
6. Client requests decrypted DEK from KMS
7. KMS uses KEK to decrypt DEK and returns decrypted DEK to client.
8. Client uses DEK to read/write encrypted data to/from stream.
HDFS Cluster
Hadoop KMS
17© Cloudera, Inc. All rights reserved.
Hadoop KMS: what's in the black orange box?
REST/HTTPS
Hadoop Key Provider API
Hadoop KMS
• Hadoop Key Management Server is a proxy between KMS clients and a backing key store
• Default store is a Java key store file
• Implementations for full-featured key servers with support for Hardware Security Module (HSM) integration available today
• HSM integration moves the root of trust to the HSM
• Provides a unified API and scalability
• Configurable caching support
• Provides key lifecycle management (create, delete, roll, etc.)
• Provides a broad set of access control capabilities
• Per-user ACL configuration for access to KMS
• Per-key ACL configuration for access to specific keys
• Strong authentication via Kerberos support
• Full featured hadoop shell command line provided
18© Cloudera, Inc. All rights reserved.
Hadoop KMS ACL example: blacklisting hdfs admin
19© Cloudera, Inc. All rights reserved.
Finally, HDFS admin is truly blocked
hdfs@secure-2 ~]$ hadoop distcp \
hdfs://secure-1:8020/accumulo/crypto/secret/keyEncryptionKey \
hdfs:// insecure-4:8020/accumulo/crypto/secret/keyEncryptionKey
15/04/18 22:41:09 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
...
15/04/18 22:41:10 ERROR util.RetriableCommand: Failure in Retriable command: Copying hdfs://secure-
1.vpc.cloudera.com:8020/accumulo/crypto/secret/keyEncryptionKey to hdfs://insecure-
4.vpc.cloudera.com:8020/accumulo/crypto/secret/keyEncryptionKeyorg.apache.hadoop.tools.mapred.RetriableFileCopyCommand$CopyReadException:
org.apache.hadoop.security.authorize.AuthorizationException: User:hdfs not allowed to do 'DECRYPT_EEK'
on 'accumulo-key'
HDFS admin can no longer copy the Accumulo key
encryption key!
20© Cloudera, Inc. All rights reserved.
Well, mostly blocked...
[hdfs@secure-2 ~]$ hadoop distcp \
hdfs://secure-1:8020/.reserved/raw/accumulo/crypto/secret/keyEncryptionKey \
hdfs:// insecure-4:8020/.reserved/raw/accumulo/crypto/secret/keyEncryptionKey
04/18 22:43:52 INFO mapred.LocalJobRunner: OutputCommitter set in config null
15/04/18 22:43:52 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
...
15/04/18 22:43:53 INFO mapreduce.Job: Job job_local2004063696_0001 completed successfully
The /.reserved/raw virtual path allows HDFS admins to perform distcp operations but the data is moved
in its encrypted form. No decryption occurs.
21© Cloudera, Inc. All rights reserved.
Accumulo SecretKeyEncryptionStrategy revisited
• Accumulo encryption at rest encrypts each RFile with a data encryption key (DEK)
• Data encryption keys are encrypted with a key encryption key (KEK)
• Data is secure at rest and in transit
• Key encryption key is stored in HDFS
• Options to protect the Accumulo KEK
• Leverage HDFS encryption in Hadoop 2.6
• Default SecretKeyEncryptionStrategy with HDFS encryption
• Accumulo on HDFS encryption
• Custom SecretKeyEncryptionStrategy
22© Cloudera, Inc. All rights reserved.
Using HDFS encryption to protect the Accumulo KEK via the Hadoop KMS
Accumulo Cluster
HDFS Cluster
Zookeeper Cluster
William(Accumulo Admin)
Halim(HDFS Admin)
Trusted Zone (implicit)
Hadoop KMS
23© Cloudera, Inc. All rights reserved.
Moving Accumulo KEK to an encryption zone# sudo -u accumulo hadoop key create accumulo-keyaccumulo-key has been successfully created with options Options{cipher='AES/CTR/NoPadding', bitLength=128,
description='null', attributes=null}.
KMSClientProvider[http://secure-3.vpc.cloudera.com:16000/kms/v1/] has been updated.
# sudo -u accumulo hadoop fs -mv /accumulo/crypto/secret /accumulo/crypto/secret-tmp# sudo –u hdfs hadoop fs –mkdir –p /accumulo/crypto/secret
# sudo –u hdfs hadoop fs –chown accumulo:accumulo /accumulo/crypto/secret
# sudo -u hdfs hdfs crypto -createZone -keyName accumulo-key -path /accumulo/crypto/secret
Added encryption zone /accumulo/crypto/secret
# sudo -u hdfs hadoop distcp -pugpx -skipcrccheck -update /accumulo/crypto/secret-tmp \
/accumulo/crypto/secret
# sudo -u accumulo hadoop fs -rm –r /accumulo/crypto/secret-tmp
Deleted /accumulo/crypto/secret
Creating KEK in an existing encryption zone is much
simpler.
24© Cloudera, Inc. All rights reserved.
Tradeoffs of hybrid approach (Accumulo KEK + KMS)
Pros
• Least effort path forward
• Minimal operational risk
• Minimal Accumulo downtime
• Allows gentle adoption of HDFS encryption and Hadoop KMS
• Leverage nearly all administrative capabilities of Hadoop KMS
Cons
• Accumulo 1.6 encryption at rest supports rfiles and write-ahead logs, but not yet recovered write-ahead logs
• Current implementation and framework is experimental
25© Cloudera, Inc. All rights reserved.
Using HDFS encryption to protect the Accumulo directory directly
Accumulo Cluster
HDFS Cluster
Zookeeper Cluster
William(Accumulo Admin)
Halim(HDFS Admin)
Trusted Zone (implicit)
Hadoop KMS
26© Cloudera, Inc. All rights reserved.
Moving Accumulo directory to an encryption zone# sudo -u accumulo hadoop key create accumulo-keyaccumulo-key has been successfully created with options Options{cipher='AES/CTR/NoPadding', bitLength=128,
description='null', attributes=null}.
KMSClientProvider[http://secure-3.vpc.cloudera.com:16000/kms/v1/] has been updated.
# sudo -u accumulo hadoop fs -mv /accumulo /accumulo-tmp# sudo –u hdfs hadoop fs –mkdir /accumulo
# sudo –u hdfs hadoop fs –chown accumulo:accumulo /accumulo
# sudo -u hdfs hdfs crypto -createZone -keyName accumulo-key -path /accumulo
Added encryption zone /accumulo
# sudo -u hdfs hadoop distcp -pugpx -skipcrccheck -update /accumulo-tmp /accumulo
# sudo -u hdfs hadoop fs -rm –r /accumulo-tmp
Deleted /accumulo
Stop tablet servers before moving data directory.
27© Cloudera, Inc. All rights reserved.
Tradeoffs of full HDFS encryption approach
Pros
• Least effort path forward
• HDFS encryption and KMS can be leveraged by multiple services (skill re-use and operational efficiency)
• HDFS encryption and KMS are generally available
• Leverage all administrative capabilities of HDFS encryptionand Hadoop KMS
Cons
• Moderate operational risk (see HBase)
• Accumulo downtime during data move
• Possible operational performance impact
28© Cloudera, Inc. All rights reserved.
Other options
• Custom SecretKeyEncryptionStrategy
• Tighter connection to core Accumulo functionality and release cycle
• Support arbitrary key servers
• But it's easy to get the details of both encryption and key management wrong
• Arbitrary key server support can also be developed via a custom key provider for the Hadoop KMS
• Native KMS SecretKeyEncryptionStrategy
• Leverage administrative functions of KMS without relying on HDFS encryption
29© Cloudera, Inc. All rights reserved.
Thank you. Let's talk about keys!