fine-grained security for spark and hive

40
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Fine-Grained Security for Spark and Hive Carter Shanklin - Director PM Don Bosco Durai - Security Architect June 29, 2016

Upload: dataworks-summithadoop-summit

Post on 16-Apr-2017

1.170 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Fine-Grained Security for Spark and Hive

1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Fine-Grained Security for Spark and HiveCarter Shanklin - Director PMDon Bosco Durai - Security ArchitectJune 29, 2016

Page 2: Fine-Grained Security for Spark and Hive

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Agenda●Current security options and challenges●Apache Ranger Overview●LLAP Overview●Use Cases and Demo●Apache Atlas Integration

Page 3: Fine-Grained Security for Spark and Hive

3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Current Options and Challenges

Page 4: Fine-Grained Security for Spark and Hive

4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Current Options and Challenges

⬢Limited to storage level access control for Spark, Pig and MR

⬢Column Level Access via HiveServer2

⬢Row Level filtering need Hive Views– Multiple Hive Views needs to be created and managed– Explicit permissions need to be given for each view/user– User need to know which view to use

⬢Masking needs custom UDF– Needs to be wrapped using Views

Page 5: Fine-Grained Security for Spark and Hive

5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Ranger Overview

Page 6: Fine-Grained Security for Spark and Hive

6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Ranger

• Central audit location for all access requests

• Support multiple destination sources (HDFS, Solr, etc.)

• Real-time visual query interface

AuditingAuthorization

• Store and manage encryption keys

• Support HDFS TDE• Integration with HSM

Ranger KMS

• Centralized platform to define, administer and manage security policies consistently

• Enforce policies within each component

Page 7: Fine-Grained Security for Spark and Hive

© Hortonworks Inc. 2015. All Rights Reserved

Page 8: Fine-Grained Security for Spark and Hive

© Hortonworks Inc. 2015. All Rights Reserved

Page 9: Fine-Grained Security for Spark and Hive

© Hortonworks Inc. 2015. All Rights Reserved

Ranger Architecture

HDFS

Ranger Administration Portal

HBase

Hive Server2

Ranger Audit Server

Ranger Plugin

Had

oop

Com

pone

nts

Ent

erpr

ise

Use

rs

Ranger Plugin

Ranger Plugin

Legacy Tools and Data Governance

HDFS

Knox

NifI

Ranger Plugin

Ranger Plugin

RDBMS

SolrRanger Plugin

Ranger Policy Server Integration API

KafkaRanger Plugin

YARNRanger Plugin

Ranger PluginStorm

Ranger Plugin

Atlas

Page 10: Fine-Grained Security for Spark and Hive

10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ranger Audits - Data Access

Page 11: Fine-Grained Security for Spark and Hive

11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ranger Audits - Admin Actions

Page 12: Fine-Grained Security for Spark and Hive

12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

LLAP Overview

Page 13: Fine-Grained Security for Spark and Hive

13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Hive 2.0 and LLAP

⬢ At a High Level:– 2000+ features, improvements and bug

fixes in Hive since HDP 2.4.– 600+ of these from outside of

Hortonworks.

⬢ Major Improvements:– Preview: Hive LLAP: Persistent query

servers with intelligent in-memory caching.

– ACID GA: Hardened and proven at scale.– Expanded SQL Compliance: More capable

integration with BI tools.– Performance: Interactive query, 2x faster

ETL.– Security: Row / Column security

extending to views, Column level security for Spark.

Page 14: Fine-Grained Security for Spark and Hive

14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Hive 2 with LLAP: Architecture Overview

Page 15: Fine-Grained Security for Spark and Hive

15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Hive 2 with LLAP: Open Interfaces

Page 16: Fine-Grained Security for Spark and Hive

16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ranger Integration with Hive and LLAP

Page 17: Fine-Grained Security for Spark and Hive

17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Hive / LLAP Security Capabilities with Ranger

⬢ Ranger Hive plugin provides authorization / access controls.⬢ Column Masking:

– Inject Hive UDFs that mask characters or hash values.– Dynamic, per-user.

⬢ Dynamic Row Filtering:– Query is analyzed and policies applied.– Dynamic, per-user.

⬢ All operations run as ordinary SQL queries:– Masking statements convert to clauses in the SQL select clause.– Filters convert to clauses in the SQL where clause.

Page 18: Fine-Grained Security for Spark and Hive

18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Native Hive Masking Capabilities

UDF Purpose Example Start Example Resultmask Convert letters to X/x and

numbers to n. 123 Fake St. nnn Xxxx Xx.

mask_first_n Mask only the first n characters. 433-54-3937 nnn-54-3937

mask_last_n Mask only the last n characters. 433-54-3937 433-54-nnnn

mask_show_first_n Mask, showing only the first n characters. 555-233-1234 555-nnn-nnnn

mask_show_last_n Mask, showing only the last n characters. 433-54-3937 nnn-nn-3937

mask_hash Produce a consistent hash of the field. CA 21f241cccaa5cfa33190f56ff1510

e37

Page 19: Fine-Grained Security for Spark and Hive

19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Delivering Spark Security

Page 20: Fine-Grained Security for Spark and Hive

20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Key Features: Spark Column Security with LLAP

⬢ Fine-Grained Column Level Access Control for SparkSQL.

⬢ Fully dynamic policies per user. Doesn’t require views.

⬢ Use Standard Ranger policies and tools to control access and masking policies.

Flow:1. SparkSQL gets data locations

known as “splits” from HiveServer and plans query.

2. HiveServer2 authorizes access using Ranger. Per-user policies like row filtering are applied.

3. Spark gets a modified query plan based on dynamic security policy.

4. Spark reads data from LLAP. Filtering / masking guaranteed by LLAP server.

Page 21: Fine-Grained Security for Spark and Hive

21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Example: Per-User Row Filtering by Region in SparkSQL

Page 22: Fine-Grained Security for Spark and Hive

22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Use Cases

Page 23: Fine-Grained Security for Spark and Hive

23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Demo Setup⬢Customer User and Sales data in ORC (Metadata in MetaStore)

⬢Data can be access via SparkSQL or HiveServer2

⬢Marketing needs access to Sales and Users data for analytics

⬢Fraud Investigation team needs access to data for fraud detection

⬢Billing team needs access to Sales and Users data for billing

Users

customer_id

customer_name

customer_email

customer_phone

customer_ccn

customer_state

customer_zip

Sales

customer_id

product_id

promotion_id

cookie_id

tracking_id

Group Users

Fraud frank

Marketing mark

Billing bill

Tables

Page 24: Fine-Grained Security for Spark and Hive

24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Use Case 1: Restricting Column Access

This is a simple use case where certain groups or users don’t permission to view the query

⬢Billing group has access to all columns in table Users

⬢Marketing group can’t access credit card column from table Users

Users

customer_id

customer_name

customer_email

customer_phone

customer_ccn

customer_state

customer_zip

User/Column customer_phone customer_ccn

bill (Billing) 😀 😀

mark (Marketing) 😀 😡

Page 25: Fine-Grained Security for Spark and Hive

25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ranger Policy - Restrict Columns

Page 26: Fine-Grained Security for Spark and Hive

26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ranger Policy - Restrict Columns - Results

bill from Billing

mark from Marketing

Page 27: Fine-Grained Security for Spark and Hive

27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ranger Policy - Restrict Columns - Audit Screen

Page 28: Fine-Grained Security for Spark and Hive

28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Use Case 2: Column Masking

In this use case where certain groups or users won't be able to see the real value of certain columns.

⬢Billing group can see the real/raw values for all columns in table Users

⬢Fraud group can only see masked values of PII and PCI fields from table Users

Users

customer_id

customer_name

customer_email

customer_phone

customer_ccn

customer_state

customer_zip

User/Column customer_email, customer_phone, customer_ccn

bill (Billing) 😀

frank (Fraud) 😎

Page 29: Fine-Grained Security for Spark and Hive

29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ranger Policies - Mask Fields

Page 30: Fine-Grained Security for Spark and Hive

30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ranger Policy - Column Masking - Results

bill from Billing

frank from Fraud

Page 31: Fine-Grained Security for Spark and Hive

31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ranger Policy - Column Masking - Audit Screen

Page 32: Fine-Grained Security for Spark and Hive

32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Use Case 3: Row Filtering

In this use case where certain groups or users won't be able to see all the rows from certain tables

⬢Billing group can see all the rows in the table Users

⬢Marketing can only see rows/data from their region in the table Users

Users

customer_id

customer_name

customer_email

customer_phone

customer_ccn

customer_state

customer_zip

User/Column Rows in Users table

bill (Billing) 😀

Mark (Marketing-CA)

Only CA Users

Page 33: Fine-Grained Security for Spark and Hive

33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ranger Policies - Row Filtering

Page 34: Fine-Grained Security for Spark and Hive

34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ranger Policy - Row Filtering - Results

bill from Billing

mark from Marketing

Page 35: Fine-Grained Security for Spark and Hive

35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Use Case 4: Row Filtering - Cross TableThis an extension of previous use cases, where the context information for filtering the row is in another table.

⬢Billing group can see all the rows in the table Sales

⬢Marketing can only see rows/data from their region in the table Sales, however Sales table doesn’t have the customer geographic information, so it needs to be derived from Users table

Users

customer_id

customer_name

customer_email

customer_phone

customer_ccn

customer_state

customer_zip

User/Column Rows in Sales table

bill (Billing) 😀

Mark (Marketing-CA)

Only CA Users

Sales

customer_id

product_id

promotion_id

cookie_id

tracking_id

Page 36: Fine-Grained Security for Spark and Hive

36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ranger Policies - Row Filtering - Cross Table

Page 37: Fine-Grained Security for Spark and Hive

37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Atlas Integration

Page 38: Fine-Grained Security for Spark and Hive

38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Cross Product Symbiosis

Apache Atlas

Apache Ranger

LLAP

Classification/ Tagging

Governance

Lineage

Tag Based Policies

Dynamic Custom Policies

Enforcement hooks

HDFS S3

MetaStore

* Column Masking and Row Filtering not yet supported by tag based policy

Page 39: Fine-Grained Security for Spark and Hive

39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ranger - Tag Based Policies

Page 40: Fine-Grained Security for Spark and Hive

40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Q & A