abdw17-lightning talks track-security in a streaming application on hadoop: a case study with apache...

26
Security in streaming applications A case study with Apache Apex Pramod Immaneni PMC Apex & Chief Architect, DT [email protected]

Upload: datatorrent

Post on 12-Apr-2017

6 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: ABDW17-Lightning Talks track-Security in a Streaming Application on Hadoop: A Case Study with Apache Apex

Security in streaming applications

A case study with Apache Apex

Pramod ImmaneniPMC Apex & Chief Architect,

[email protected]

Page 2: ABDW17-Lightning Talks track-Security in a Streaming Application on Hadoop: A Case Study with Apache Apex

Apache Apex

• Stream processing platform• In memory, distributed

• Simple programming model• Write your own custom logic, pipelining

• Scalable• High throughput, low latency• Dynamic scaling responding to SLA

• Fault tolerant• Node outages, hadoop outages• Stateful recovery, incremental recovery• End to end exactly once

• Productivity library• Commonly needed connectors, business logic• Production tested

• Operability – DataTorrent RTS• Deployment and monitoring console• Deep introspection and debugging

Page 3: ABDW17-Lightning Talks track-Security in a Streaming Application on Hadoop: A Case Study with Apache Apex

Apache Apex and DataTorrent Product StackDesigned to help you at every stage of your data-in-motion pipeline

Solutions for Business Problems

Ingestion & Data Prep ETL Pipelines

Ease of Use Tools Real-Time Data VisualizationManagement & MonitoringGUI Application

Assembly

Application Templates

Apex-Malhar Operator Library

Big Data Infrastructure Hadoop 2.x – YARN + HDFS – On Prem & Cloud

Core

High-level APITransformation ML & Score SQL Analytic

s

FileSync

Dev Framework

Batch Support

Apache Apex Core

Kafka HDFS

HDFS HDFS JDBC HDFS JDBC

Kafka

Page 4: ABDW17-Lightning Talks track-Security in a Streaming Application on Hadoop: A Case Study with Apache Apex

4

Application Development Model

A Stream is a sequence of data tuplesA typical Operator takes one or more input streams, performs computations & emits one or more output streams

• Each Operator is YOUR custom business logic in java, or built-in operator from our open source library• Operator has many instances that run in parallel and each instance is single-threaded

Directed Acyclic Graph (DAG) is made up of operators and streams

Directed Acyclic Graph (DAG)

Filtered

Stream

Output StreamTuple Tuple

Filtered Stream

Enriched Stream

Enriched

Stream

er

Operator

er

Operator

er

Operator

er

Operator

er

Operator

er

Operator

Page 5: ABDW17-Lightning Talks track-Security in a Streaming Application on Hadoop: A Case Study with Apache Apex

5

Native Hadoop Integration

• YARN is the resource manager

• HDFS for storing persistent state

Page 6: ABDW17-Lightning Talks track-Security in a Streaming Application on Hadoop: A Case Study with Apache Apex

6

•Secure Hadoop•Kerberos security•Delegation tokens•All interactions between distributed components are authenticated•Kerberos enabled for Hadoop web services and management pages

•Running Apex on Secure Hadoop•Apex CLI•Apex applications

•Running DT Console and Gateway on Secure Hadoop

Components

6

Page 7: ABDW17-Lightning Talks track-Security in a Streaming Application on Hadoop: A Case Study with Apache Apex

7

•Kerberos•Authentication in multi-user, multi-node computing environments

•Between users and services•Between services across nodes

•Mutual authentication•Use of central trusted service

•Symmetric keys•Created by MIT

•Delegation Tokens•Used when a party does not have kerberos credentials

•Non fixed clients like application containers•A byte sequence created from different fields such as user information, timestamp and keys•Tokens have an expiry period•Clients provide tokens and the services verify the token

Kerberos and Delegation tokens

7

Page 8: ABDW17-Lightning Talks track-Security in a Streaming Application on Hadoop: A Case Study with Apache Apex

8

Apex CLI

8

•Uses Kerberos credentials to authenticate with Hadoop•Sets up delegation tokens for STRAM during launch

•RM and NN delegation tokens•Support HA configuration

•Sets up credentials for token refresh (discussed later)•Impersonation

•Can proxy as a specified user different from the Kerberos credentials•Requires extra Hadoop configuration

Page 9: ABDW17-Lightning Talks track-Security in a Streaming Application on Hadoop: A Case Study with Apache Apex

9

CLI configuration

9

• Short lived applicationsLogin to Kerberos using kinit

<property> <name>dt.authentication.principal</name> <value>kerberos-principal</value>

</property><property>

<name>dt.authentication.keytab</name> <value>keytab-file</value>

</property><property>

<name>dt.authentication.store.keytab</name> <value>hdfs-path-to-keytab-file</value>

</property>

• Long living applicationsConfiguration in dt-site.xml

kinit -k -t path-to-keytab-file kerberos-principal

Page 10: ABDW17-Lightning Talks track-Security in a Streaming Application on Hadoop: A Case Study with Apache Apex

10

Apex application architecture

10

Page 11: ABDW17-Lightning Talks track-Security in a Streaming Application on Hadoop: A Case Study with Apache Apex

11

•Uses delegation tokens when communicating with Hadoop•Refreshes them before they are expired

•Hadoop delegation tokens for Streaming Containers•Seeds them during launch

•Stram delegation tokens•Used for RPC communication between Streaming Containers and STRAM•Created and seeded by STRAM when containers are launched

•Buffer server tokens•Used for authentication between Buffer server and clients•Peer-to-peer authentication between containers•Created and seeded by STRAM during container deployment•Persists tokens in case containers fail and restart

STRAM

11

Page 12: ABDW17-Lightning Talks track-Security in a Streaming Application on Hadoop: A Case Study with Apache Apex

12

•Delegation Tokens from STRAM•Uses Hadoop delegation tokens for hadoop services•Uses STRAM delegation token to communicate with STRAM

•Buffer server token•Receives it in initial deployment context – StreamingContainerContext•Starts buffer server with this token

•Buffer server client tokens•Receives client tokens for input and output ports in operator deployment context

•InputDeployInfo and OuputDeployInfo•Seeds the buffer server clients with these tokens

•Used in communication with buffer server

Streaming Container

12

Page 13: ABDW17-Lightning Talks track-Security in a Streaming Application on Hadoop: A Case Study with Apache Apex

13

•Each buffer server has its own token•Seeded during start

•Clients have to provide the token to authenticate and receive services

•BufferServerPublisher – Used by operators to send data to buffer server•BufferServerSubscriber – Receives data from buffer server for operators•BufferServerController – Used by STRAM to communicate with buffer server to perform maintenance tasks

•WIP•Provide more options such as multiple tokens and token refresh

Buffer Server

13

Page 14: ABDW17-Lightning Talks track-Security in a Streaming Application on Hadoop: A Case Study with Apache Apex

14

•STRAM web service interface•To query details such as health, status, statistics etc of an application•To affect changes in the application

•Challenges•Cannot be Kerberos for same reasons as STRAM RPC•Clients will not have delegation tokens to start with

•Hybrid approach•Clients authenticate with Resource Manager proxy using Kerberos

•Proxy communicates with STRAM web service filter without any credentials•Filter only accepts non-credential requests from proxy•Sends an authentication token back to client

•Clients use the authentication token for all future communication with STRAM•RTS Gateway and map-reduce use this approach

STRAM web services

14

Page 15: ABDW17-Lightning Talks track-Security in a Streaming Application on Hadoop: A Case Study with Apache Apex

15

Web services authentication

15

Page 16: ABDW17-Lightning Talks track-Security in a Streaming Application on Hadoop: A Case Study with Apache Apex

16

Configuration

16

• Web service authentication can be enabled or disabled per application

• Hadoop also does not make it mandatory in secure mode<property> <name>dt.application.name.attr.STRAM_HTTP_AUTHENTICATION</name>

<value>security-option</value> </property>

• Security OptionsENABLE- Enable AuthenticationFOLLOW_HADOOP_AUTH - Enable authentication if secure mode is

enabled in Hadoop, the defaultFOLLOW_HADOOP_HTTP_AUTH - Enable authentication only if HTTP

authentication is enabled in Hadoop and not just secure mode.DISABLE - Disable Authentication

Page 17: ABDW17-Lightning Talks track-Security in a Streaming Application on Hadoop: A Case Study with Apache Apex

17

•Delegation tokens in Hadoop expire after 7 days•Applications will be killed

•Options•Configure Hadoop to increase expiry time – Not practical•Can application get new tokens before current tokens expire

•Auto-refresh•STRAM and Streaming Container request Hadoop services for new delegation tokens before current ones expire•To request new tokens Kerberos credentials are needed•Kerberos credentials and other configuration provided by Apex CLI when application is being launched

Delegation Token refresh

17

Page 18: ABDW17-Lightning Talks track-Security in a Streaming Application on Hadoop: A Case Study with Apache Apex

18

Configuration

18

<property> <name>dt.authentication.store.keytab</name> <value>hdfs-path-to-keytab-file</value>

</property><property>

<name>dt.resourcemanager.delegation.token.max-lifetime</name> <value>604800000</value> </property> <property>

<name>dt.namenode.delegation.token.max-lifetime</name> <value>604800000</value> </property><property>

<name>dt.authentication.token.refresh.factor</name> <value>0.7</value> </property>

Page 19: ABDW17-Lightning Talks track-Security in a Streaming Application on Hadoop: A Case Study with Apache Apex

19

•Backend service for the UI Console

•On secure Hadoop•Interact with Kerberos enabled Hadoop•Work with secure Applications

•Supports user authentication into the UI console•LDAP, AD, Kerberos etc

DTGateway

19

Page 20: ABDW17-Lightning Talks track-Security in a Streaming Application on Hadoop: A Case Study with Apache Apex

20

DTGateway security architecture

20

Page 21: ABDW17-Lightning Talks track-Security in a Streaming Application on Hadoop: A Case Study with Apache Apex

21

Configuration

21

• Uses Kerberos credentials to authenticate with Hadoop• Kerberos credentials can be configured during installation

<property> <name>dt.gateway.authentication.principal</name>

<value>[kerberos-principal]</value></property><property>

<name>dt.gateway.authentication.keytab</name> <value>[keytab-file]</value></property>

• Uses Kerberos over HTTP (SPNEGO) when interacting with Hadoop web services that have Kerberos enabled

Page 22: ABDW17-Lightning Talks track-Security in a Streaming Application on Hadoop: A Case Study with Apache Apex

22

Application launch & Impersonation

22

•Applications launched via CLI using dtGateway’s own Kerberos credentials•The user the Applications will run as, on the Hadoop side, can be configured

•Specified as a configuration setting

•The possible values for the user-strategy are•AUTH_USER – app runs as the authenticated user, default if not configured•GATEWAY_USER – app runs as the same user the dtGateway process is running under•SPECIFIED_USER –app runs as the specific user

<property> <name>dt.gateway.hadoop.user.strategy</name>

<value>[user-strategy]</value></property>

<property> <name>dt.gateway.hadoop.user.name</name> <value>specific-username</value>

</property>

Page 23: ABDW17-Lightning Talks track-Security in a Streaming Application on Hadoop: A Case Study with Apache Apex

23

Hadoop configuration

23

<property> <name>hadoop.proxyuser.[username].groups</name>

<value>*</value></property><property>

<name>hadoop.proxyuser.[username].hosts</name> <value>*</value></property>

• The username above should be the DTGateway kerberos username• Allows dtGateway to connect with the kerberos username and launch apps as other

users

• Additional configuration needs on Hadoop side

Page 24: ABDW17-Lightning Talks track-Security in a Streaming Application on Hadoop: A Case Study with Apache Apex

24

Authentication with STRAM

24

• Security is enabled for STRAM web services• dtGateway first obtains a security cookie by connecting to STRAM via

RM proxy• Make a web service request to STRAM web service path /ws via the RM proxy• Receives a cookie called dt-client

• Subsequently uses the cookie for all direct communication with STRAM

Page 26: ABDW17-Lightning Talks track-Security in a Streaming Application on Hadoop: A Case Study with Apache Apex

Q&A

26