hadoop security

43
Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Hadoop Security with HDP/PHD

Upload: shivaji-dutta

Post on 06-Aug-2015

222 views

Category:

Software


3 download

TRANSCRIPT

Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Hadoop Security with HDP/PHD

Page 2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Disclaimer

This document may contain product features and technology directions that are under development or may be under development in the future.

Technical feasibility, market demand, user feedback, and the Apache Software Foundation community development process can all effect timing and final delivery.

This document’s description of these features and technology directions does not represent a contractual commitment from Hortonworks to deliver these features in any generally available product.

Product features and technology directions are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.

Page 3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Agenda

• Hadoop Security

• Kerberos

• Authorization and Auditing with Ranger

• Gateway Security with Knox

• Encryption

Page 4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

• Wire encryption in Hadoop

• Native and partner encryption

• Centralized audit reporting w/ Apache Ranger

• Fine grain access control with Apache Ranger

Security today in Hadoop with HDP/PHD

AuthorizationWhat can I do?

AuditWhat did I do?

Data ProtectionCan data be encrypted at rest and over the wire?

• Kerberos• API security with

Apache Knox

AuthenticationWho am I/prove it?

HD

P\P

HD

Centralized Security Administration

En

terp

rise

Se

rvic

es:

Se

curit

y

Page 5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Security needs are changing

AdministrationCentrally management & consistent security

AuthenticationAuthenticate users and systems

AuthorizationProvision access to data

AuditMaintain a record of data access

Data ProtectionProtect data at rest and in motion

Security needs are changing• YARN unlocks the data lake

• Multi-tenant: Multiple applications for data access

• Different kinds of data

• Changing and complex compliance environment

201465% of clusters host multiple workloads

Fall 2013Largely silo’d deployments with single workload clusters

Page 6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

HDFS

Typical Flow – Hive Access through Beeline client

HiveServer 2A B C

Beeline Client

Page 7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

HDFS

Typical Flow – Authenticate through Kerberos

HiveServer 2A B C

KDC

Use Hive Service T,icket submit query

Hive gets Namenode (NN) service ticket

Hive creates map reduce using NN Service Ticket

Client • Requests a TGT• Receives TGT• Client dcrypts it with the password

hash• Sends the TGT and receives a Service

Ticket

Beeline Client

Page 8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

HDFS

Typical Flow – Add Authorization through Ranger(XA Secure)

HiveServer 2A B C

KDC

Use Hive ST, submit query

Hive gets Namenode (NN) service ticket

Hive creates map reduce using NN ST

Ranger

Client gets service ticket for Hive

Beeline Client

Page 9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

HDFS

Typical Flow – Firewall, Route through Knox Gateway

HiveServer 2A B C

KDC

Use Hive ST, submit query

Hive gets Namenode (NN) service ticket

Hive creates map reduce using NN ST

Ranger

Knox gets service ticket for Hive

Knox runs as proxy user using Hive ST

Original request w/user id/password

Client gets query result

Beeline Client

Apache Knox

Page 10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

HDFS

Typical Flow – Add Wire and File Encryption

HiveServer 2A B C

KDC

Use Hive ST, submit query

Hive gets Namenode (NN) service ticket

Hive creates map reduce using NN ST

Ranger

Knox gets service ticket for Hive

Knox runs as proxy user using Hive ST

Original request w/user id/password

Client gets query result

SSL

Beeline Client

SSL SASL

SSL SSL

Apache Knox

Page 11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Security Features

PHD/HDP Security

Authentication

Kerberos Support ✔

Perimeter Security – For services and rest API ✔

Authorizations

Fine grained access control HDFS, Hbase and Hive, Storm and Knox

Role base access control ✔

Column level ✔

Permission Support Create, Drop, Index, lock, user

Auditing

Resource access auditing Extensive Auditing

Policy auditing ✔

Page 12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

HDP/PHD Security w/ Ranger

Data Protection

Wire Encryption ✔Volume Encryption TDE

File/Column Encryption HDFS TDE & Partners

Reporting

Global view of policies and audit data ✔

Manage

User/ Group mapping ✔

Global policy manager, Web UI ✔

Delegated administration ✔

Security Features

Page 13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Partner Integration

Security Integrations:● Ranger plugins: centralize authorization/audit of 3rd party s/w in Ranger

UI● Via Custom Log4J appender, can stream audit events to INFA infrastructure

● Knox: Route partner APIs through Knox after validating compatibility● Provide SSO capability to end users

Page 14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Authentication w/ Kerberos

Page 14

Page 15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Kerberos in the field

Kerberos no longer “too complex”. Adoption growing.● Ambari helps automate and manage kerberos integration with cluster

Use: Active directory or a combine Kerberos/Active Directory● Active Directory is seen most commonly in the field● Many start with separate MIT KDC and then later grow into the AD KDC

Knox should be considered for API/Perimeter security● Removes need for Kerberos for end users● Enables integration with different authentication standards● Single location to manage security for REST APIs & HTTP based services● Tip: In DMZ

Page 22 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Authorization and AuditingApache Ranger

Page 22

Page 23 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Authorization and Audit

AuthorizationFine grain access control

• HDFS – Folder, File

• Hive – Database, Table, Column

• HBase – Table, Column Family, Column

• Storm, Knox and more

AuditExtensive user access auditing in HDFS, Hive and HBase

• IP Address

• Resource type/ resource

• Timestamp

• Access granted or denied

Control access into

system

Flexibility in defining

policies

Page 24 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Central Security Administration

Apache Ranger• Delivers a ‘single pane of glass’ for

the security administrator

• Centralizes administration of security policy

• Ensures consistent coverage across the entire Hadoop stack

Page 25 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Setup Authorization Policies

25

file level access control, flexible definition

Control permissions

Page 26 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Monitor through Auditing

26

Page 27 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Apache Ranger Flow

Page 28 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Authorization and Auditing w/ Ranger

HDFS

Ranger Administration Portal

HBase

Hive Server2

Ranger Policy Server

Ranger Audit Server

Ranger Plugin

Had

oop

Com

pone

nts

Ent

erpr

ise

Use

rs

Ranger Plugin

Ranger Plugin

Legacy Tools & Data

Governance

Integration APIHDFS

Knox

Storm

Ranger Plugin

Ranger Plugin

RDBMS

HDP 2.2 Additions Planned for 2015

TBD

En

terp

rise

Se

rvic

es:

Se

curit

y

Ranger Plugin*

Page 29 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Installation Steps

• Install PHD 3.0

• Install Apache Ranger (https://tinyurl.com/mlgs3jy)– Install Policy Manager

– Install User Sync

– Install Ranger Plugins

• Start Policy Manager– service ranger-admin start

• Verify – http://<host>:6080/- admin/admin

Page 30 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Ranger Plugins

• HDFS

• HIVE

• KNOX

• STORM

• HBASE

Steps to Enable plugins

1. Start the Policy Manager

2. Create the Plugin repository in the Policy Manager

3. Install the Plugin• Edit the install.properties

• Execue ./enable-<plugin>.sh

4. Restart the plugin service (e.g. HDFS, Hive etc)

Page 31 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Ranger Console

31

• The Repository Manager Tab• The Policy Manager Tab• The User/Group Tab• The Analytics Tab• The Audit Tab

Page 32 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Repository Manager

32

• Add New Repository• Edit Repository• Delete Repository

Page 33 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Demo

33

Page 34 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

REST API Security through KnoxSecurely share Hadoop Cluster

Page 34

Page 35 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Share Data Lake with everyone - Securely

• Simplifies access: Extends Hadoop’s REST/HTTP services by encapsulating Kerberos to within the Cluster.

• Enhances security: Exposes Hadoop’s REST/HTTP services without revealing network details, providing SSL out of the box.

• Centralized control: Enforces REST API security centrally, routing requests to multiple Hadoop clusters.

• Enterprise integration: Supports LDAP, Active Directory, SSO, SAML and other authentication systems.

Page 36 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Apache Knox

Knox can be used with both unsecured Hadoop clusters, and Kerberos secured clusters. In an enterprise solution that employs Kerberos secured clusters, the Apache Knox Gateway provides an enterprise security solution that:

• Integrates well with enterprise identity management solutions

• Protects the details of the Hadoop cluster deployment (hosts and ports are hidden from end users)

• Simplifies the number of services with which a client needs to interact

Page 37 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Load Balancer

Extend Hadoop API reach with Knox

Hadoop Cluster

Application TierApp A App NApp B App C

Data Ingest

ETL

Admin/ Operators

Bastian Node

SSH

RPC Call

FalconOozieScoopFlume

Data Operator

Business User

Hadoop Admin

JDBC/ODBCREST/HTTP

Knox

Page 38 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

HDFS

Typical Flow – Add Wire and File Encryption

HiveServer 2A B C

KDC

Use Hive ST, submit query

Hive gets Namenode (NN) service ticket

Hive creates map reduce using NN ST

Ranger

Knox gets service ticket for Hive

Knox runs as proxy user using Hive ST

Original request w/user id/password

Client gets query result

SSL

Beeline Client

SSL SASL

SSL SSL

Apache Knox

Page 39 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Why Knox?

Simplified Access

•Kerberos encapsulation •Extends API reach•Single access point•Multi-cluster support•Single SSL certificate

Centralized Control

• Central REST API auditing• Service-level authorization• Alternative to SSH “edge node”

Enterprise Integration

•LDAP integration•Active Directory integration•SSO integration•Apache Shiro extensibility•Custom extensibility

Enhanced Security

• Protect network details• SSL for non-SSL services• WebApp vulnerability filter

Page 40 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Hadoop REST API with Knox

Service Direct URL Knox URLWebHDFS http://namenode-host:50070/webhdfs https://knox-host:8443/webhdfs

WebHCat http://webhcat-host:50111/templeton https://knox-host:8443/templeton

Oozie http://ooziehost:11000/oozie https://knox-host:8443/oozie

HBase http://hbasehost:60080 https://knox-host:8443/hbase

Hive http://hivehost:10001/cliservice https://knox-host:8443/hive

YARN http://yarn-host:yarn-port/ws https://knox-host:8443/resourcemanager

Masters could be on many

different hosts

One hosts, one port

Consistent paths

SSL config at one host

Page 41 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Hadoop REST API Security: Drill-Down

Page 41

RESTClient

EnterpriseIdentityProvider

LDAP/AD

Knox Gateway

GWGW

Firew

all

Firew

all

DMZ

LB

Edge Node/Hado

op CLIs RPC

HTTP

HTTP HTTP

LDAP

Hadoop Cluster 1Masters

Slaves

RM

NN

WebHCat

Oozie

DN NM

HS2

Hadoop Cluster 2Masters

Slaves

RM

NN

WebHCat

Oozie

DN NM

HS2

HBase

HBase

Page 42 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Knox –features in PHD

• Use Ambari for Install/start/stop/configuration• Knox support for HDFS HA• Support for YARN REST API• Support for SSL to Hadoop Cluster Services (WebHDFS, HBase,

Hive & Oozie)• Integration with Ranger for Knox Service Level Authorization

•Knox Management REST API

Page 43 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Installation

• Installed via Ambari–This can be done manually–Start the embeded ldap

• There is good examples in the Apache doc with groovy scripts

–https://knox.apache.org/books/knox-0-4-0/knox-0-4-0.html

Page 44 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Data ProtectionWire and data at rest encryption

Page 44

Page 45 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Data Protection

HDP allows you to apply data protection policy at different layers across the Hadoop stack

Layer What? How ?

Storage and Access Encrypt data while it is at rest Partners, HDFS Tech Preview, Hbase

encryption, OS level encrypt,

Transmission Encrypt data as it moves Supported from HDP 2.1

Page 49 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

HDFS Transparent Data Encryption (TDE) in 2.2

•Data encryption on a higher level than the OS one whilst remaining native and transparent to Hadoop•End-to-end: data can be both encrypted and decrypted by the clients•Encryption/decryption using the usual HDFS functions from the client

•No need to requiring to change user application code

•No need to store data encryption keys on HDFS itself

•No need to unencrypted data.

•Data is effectively encrypted at rest, but since it is decrypted on the client side, it means that it is also encrypted on the wire while being transmitted.•HDFS file encryption/decryption is transparent to its client

•users can read/write files to/from encryption zone as long they have the permission to access it

•Depends on installing a Key Management Server

Page 53 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

HDFS Transparent Data Encryption (TDE) in 2.2

•Data encryption on a higher level than the OS one whilst remaining native and transparent to Hadoop•End-to-end: data can be both encrypted and decrypted by the clients•Encryption/decryption using the usual HDFS functions from the client

•No need to requiring to change user application code

•No need to store data encryption keys on HDFS itself

•No need to unencrypted data.

•Data is effectively encrypted at rest, but since it is decrypted on the client side, it means that it is also encrypted on the wire while being transmitted.

•HDFS file encryption/decryption is transparent to its client•users can read/write files to/from encryption zone as long they have the permission to access it

•Depends on installing a Key Management Server

Page 54 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

HDFS Transparent Data Encryption (TDE) - Steps

•Install and run KMS on top of HDP 2.2•Change HDFS params via Ambari•Create encryption key

•hadoop key create key1 -size 256

•hadoop key list –metadata

•Create an encryption zone using the key•hdfs dfs -mkdir /zone1

•hdfs crypto -createZone -keyName key1 /zone1

•hdfs –listZones

–http://hortonworks.com/kb/hdfs-transparent-data-encryption/

Page 55 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Thank You