advanced security in hadoop cluster

25
View Hadoop Administration Course at www.edureka.co/hadoop-admin Advanced Security in Hadoop Cluster

Upload: edureka

Post on 14-Aug-2015

370 views

Category:

Technology


2 download

TRANSCRIPT

View Hadoop Administration Course at www.edureka.co/hadoop-admin

Advanced Security in Hadoop Cluster

www.edureka.co/hadoop-adminSlide 2 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions

Objectives

At the end of this module, you will be able to

Hadoop Cluster introductionRecommended Configuration for clusterHadoop cluster running modesHadoop Security with KerberosHDFS Security with ACLs (Access Control Lists )Hadoop Admin ResponsibilitiesDemo on Security

Slide 3Slide 3Slide 3 www.edureka.co/java-hadoop

Hadoop Core Components

Hadoop 2.x Core Components

HDFS YARN

Storage Processing

DataNode

Active NameNode Resource Manager

Node Manager

Master

Slave

StandbyNameNode

www.edureka.co/hadoop-admin

Slide 4

RAM: 16GBHard disk: 6 x 2TBProcessor: Xenon with 2 coresEthernet: 3 x 10 GB/sOS: 64-bit CentOS

Hadoop Cluster: A Typical Use Case

RAM: 16GBHard disk: 6 x 2TBProcessor: Xenon with 2 cores.Ethernet: 3 x 10 GB/sOS: 64-bit CentOS

RAM: 64 GB,Hard disk: 1 TBProcessor: Xenon with 8 CoresEthernet: 3 x 10 GB/sOS: 64-bit CentOSPower: Redundant Power Supply

RAM: 32 GB,Hard disk: 1 TBProcessor: Xenon with 4 CoresEthernet: 3 x 10 GB/sOS: 64-bit CentOSPower: Redundant Power Supply

Active NameNodeSecondary NameNode

DataNode DataNode

RAM: 64 GB,Hard disk: 1 TBProcessor: Xenon with 8 CoresEthernet: 3 x 10 GB/sOS: 64-bit CentOSPower: Redundant Power Supply

StandBy NameNode

Optional

RAM: 16GBHard disk: 6 x 2TBProcessor: Xenon with 2 coresEthernet: 3 x 10 GB/sOS: 64-bit CentOS

DataNode

DataNode DataNode DataNode

www.edureka.co/hadoop-admin

www.edureka.co/hadoop-adminSlide 5

Slave Nodes: Recommended Configuration

Higher-performance vs lower performance components

Save the Money, Buy more Nodes!

General ( Depends on requirement ‘base’ configuration for a slave Node

» 4 x 1 TB or 2 TB hard drives, in a JBOD* configuration

» Do not use RAID!» 2 x Quad-core CPUs» 24 -32GB RAM» Gigabit Ethernet

General Configuration

Multiples of ( 1 hard drive + 2 cores+ 6-8GB RAM) generally work wellfor many types of applications

Special Configuration

Slave Nodes

“A cluster with more nodes performs better than one with fewer, slightly faster nodes”

www.edureka.co/hadoop-adminSlide 6

Hadoop Cluster Modes

Hadoop can run in any of the following three modes:

Fully-Distributed Mode

Pseudo-Distributed Mode

No daemons, everything runs in a single JVM Suitable for running MapReduce programs during development Has no DFS

Hadoop daemons run on the local machine

Hadoop daemons run on a cluster of machines

Standalone (or Local) Mode

Slide 7 www.edureka.in/hadoop-admin

Security issues in Hadoop Cluster

Unauthorized clients can impersonate authorized users and access the cluster

Get the blocks directly from the Data nodes by bypassing the Name node

Eavesdropping of data packets being sent by Data nodes to client

Not all users should have access to sensitive data

No User verification for Map Reduce code execution, malicious users could submit a job

Insecure Network Transport

No Message level security

Slide 8 www.edureka.in/hadoop-admin

Hadoop security considerations

Authentication

Authorization

Access control

Data masking and encryption

Network security

Integrity

Confidentiality

Audits and event monitoring

Slide 9 www.edureka.in/hadoop-admin

Hadoop Authentication with Kerberos

Slide 10 www.edureka.in/hadoop-admin

Kerberos to the rescue

Network authentication protocol

Developed at MIT in the mid 1980s

Easy for administrators to manage passwords by storing them centrally

Enhance security by ensuring no clear text passwords are transmitted

Allow users to access different services with the same password

Available as open source or in supported commercial software

Slide 11 www.edureka.in/hadoop-admin

Kerberos Design Requirements

Interactions between hosts and clients should be encrypted.

Must be convenient for users (or they won’t use it).

Protect against intercepted credentials.

Kerberos is based on the Secret-Key Distribution Model

-keys are the basis of authentication in Kerberos

-typically a short sequence of bytes.

-used to both encrypt & decrypt

Slide 12 www.edureka.in/hadoop-admin

Kerberos Components & Terminology

Kerberos Client

Kerberos Server

Kerberos Key Distribution Center ( KDC )

Authentication Server ( AS )

Ticket-Granting Server ( TGS )

Users and Services in a Kerberos realm are know as Principals.

Slide 13 www.edureka.in/hadoop-admin

Kerberos to the rescue

Kerberos Integration

User Authentication User and Group access control list at

cluster level Tokens

Delegation

Job

Block Access

Simple Authentication and Security Layer (SASL) with RPC digest mechanism

Server

1: AuthenticationGet TGT

2: AuthorizationGet Service Ticket

3: Service RequestStart Service Session

Kerberos Key Distribution Center

Authentication Server

Ticket Granting Server

Client

Slide 14 www.edureka.in/hadoop-admin

Kerberos to the rescue

Server

Kerberos Key Distribution Center

Authentication Server

Ticket Granting Server

Client1.Request TGT (Auth)

2.Responds with encrypted session key + TGT (TGT + Sk1)

3. Request Service ticket by providing TGT

4. Encrypted session key and ticket granted for service access( TGT + Sk2 )

5. Authenticates with Service Ticket(Auth + TGT)

6. Server responds with encrypted timestamp ( Sk2 + Auth )

(Auth + TGT)

Auth -> AuthenticatorTGT -> Ticket Granting TicketSk1 Sk2 -> Session Key

Slide 15 www.edureka.in/hadoop-admin

Kerberos advantages

A password never travels over the network. Only time-sensitive tickets travel over the network.

Passwords or secret keys are only known to the KDC and the principal.

Kerberos supports passwords or secret keys to be stored in a centralized credential store that is LDAP-complaint. This makes it easy for the administrators to manage the system and the users.

Servers don't have to store any tickets or any client-specific details to authenticate a client.

Slide 16 www.edureka.in/hadoop-admin

Hadoop Authorization with ACLs

Slide 17 www.edureka.in/hadoop-admin

HDFS Permissions ( ACLs )

HDFS has supported a permission model equivalent to traditional Unix permission

For each file or directory, permissions are managed for a set of 3 distinct user classesOwner Group Others

There are 3 different permissions controlled for each user classRead Write Execute

For files : The r permission is required to read the file, and the w permission is required to write or append to the file.

For directories : the r permission is required to list the contents of the directory, the w permission is required to create or delete files or directories, and the x permission is required to access a child of the directory.

Slide 18 www.edureka.in/hadoop-admin

HDFS Permissions ( ACLs )

Each client process that accesses HDFS has a two-part identity composed of the user name, and groups list.

Whenever HDFS must do a permissions check for a file or directory foo accessed by a client process

1. If the user name matches the owner of foo, then the owner permissions are tested

2. Else if the group of foo matches any of member of the groups list, then the group permissions are tested

3. Otherwise the other permissions of foo are tested.

4. If a permissions check fails, the client operation fails.

Slide 19 www.edureka.in/hadoop-admin

ACLs Shell Commands

hdfs dfs -getfacl [-R] <path>

Displays the Access Control Lists (ACLs) of files and directories. If a directory has a default ACL, then getfacl also displays the default ACL.

hdfs dfs -setfacl [-R] [-b |-k -m |-x <acl_spec> <path>] |[--set <acl_spec> <path>]

Sets Access Control Lists (ACLs) of files and directories.

hdfs dfs -ls <args>

The output of ls will append a ‘+’ character to the permissions string of any file or directory that has an ACL.

www.edureka.co/hadoop-adminSlide 20

DEMO

www.edureka.co/hadoop-adminSlide 21

Hadoop Admin Responsibilities

Responsible for implementation and administration of Hadoop infrastructure.

Testing HDFS, Hive, Pig and MapReduce access for Applications.

Cluster maintenance tasks like Backup, Recovery, Upgrade, Patching.

Performance tuning and Capacity planning for Clusters.

Monitor Hadoop cluster and deploy security.

LIVE Online Class

Class Recording in LMS

24/7 Post Class Support

Module Wise Quiz

Project Work

Verifiable Certificate

www.edureka.co/hadoop-adminSlide 22 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions

How it Works?

Questions

www.edureka.co/hadoop-adminSlide 23 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions

www.edureka.co/hadoop-adminSlide 24

Course Topics

Module 1 » Hadoop Cluster Administration

Module 2» Hadoop Architecture and Cluster setup

Module 3 » Hadoop Cluster: Planning and Managing

Module 4 » Backup, Recovery and Maintenance

Module 5 » Hadoop 2.0 and High Availability

Module 6» Advanced Topics: QJM, HDFS Federation and

Security

Module 7» Oozie, Hcatalog/Hive and HBase Administration

Module 8» Project: Hadoop Implementation