cloud computing: hadoop security design - 2009

19
CLOUD COMPUTING: HADOOP SECURITY DESIGN -2009 Kaveh Noorbakhsh Kent State: CS Owen O’Malley | Kan Zhang | Sanjay Radia | Ram Marti | Christopher Harrell | Yahoo! *ALL OPINIONS AND INFORMATION ARE MINE AND DO NOT REPRESENT THE VIEW(S) OF MY EMPLOYER

Upload: jonco

Post on 22-Feb-2016

34 views

Category:

Documents


0 download

DESCRIPTION

* All opinions and information are mine and do not represent the view(S) of my employer. Cloud Computing: hadoop Security Design - 2009. Kaveh Noorbakhsh Kent State: CS. Owen O’Malley | Kan Zhang | Sanjay Radia | Ram Marti | Christopher Harrell | Yahoo ! . - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Cloud Computing:  hadoop  Security Design - 2009

CLOUD COMPUTING: HADOOP SECURITY DESIGN-2009

Kaveh NoorbakhshKent State: CS

Owen O’Malley | Kan Zhang | Sanjay Radia | Ram Marti | Christopher Harrell | Yahoo!

*ALL OPINIONS AND INFORMATION ARE MINE AND DO NOT REPRESENT THE VIEW(S) OF MY EMPLOYER

Page 2: Cloud Computing:  hadoop  Security Design - 2009

Brief History: Cloud Computing as a Service

1961:

•John McCarthy Introduces Concept of Cloud Computing as a business model

1969

•ARAPANET

1997

•“Cloud Computing” coined by Ramnath Chellappa

1999

•Saleforce.com•Enterprise Applications via simple web interface

2002

•Amazon Web Services

2004

•HDFS & Map/Reduce in Nutch

2006

•Google Docs

•Amazon EC2

•Yahoo hires Doug Cutting

2008

•Eucalyptus•1st Open Source AWS API for Private Clouds

•OpenNebula•Private and Hybrid clouds

•Hadoop hits web scale

2009

•MS Azure•Amazon RDS•MySQL supported

2011

•Amazon RDS supports Oracle

•Office 365

Page 3: Cloud Computing:  hadoop  Security Design - 2009

Hadoop – Funny Name, Big Impact

Page 4: Cloud Computing:  hadoop  Security Design - 2009

Map/Reduce allows computation to scale out over many “cheap” systems rather than one expensive super computer

Map/Reduce An Introduction

Page 5: Cloud Computing:  hadoop  Security Design - 2009

Divide and Conquer“Work”

w1 w2 w3

r1 r2 r3

“Result”

“worker”

“worker”

“worker”

Partition

Combine

Page 6: Cloud Computing:  hadoop  Security Design - 2009

Two Layers

MapReduce:

Code runs here

HDFS:

Data lives here

Page 7: Cloud Computing:  hadoop  Security Design - 2009

Advantages of the Cloud

Database as a Service = DBaaS

Infrastructure as a Service = Iaas

Software as a Service = SaaS

Platform as a Service = PaaS

Share hardware and energy costs

Share employee costs Fast spin-up and tear down Expand quickly to meet

demands Costs ideally proportional to

usage Scalability

Page 8: Cloud Computing:  hadoop  Security Design - 2009

Cloud Services Spending

Billions of Dollars

Cloud Services Revenue0

20406080

100120140160

200920102014

Page 9: Cloud Computing:  hadoop  Security Design - 2009

Cloud vs Total IT Spending

Billions of Dollars

Cloud S

ervice

s Rev

enue

Total

IT Ex

pend

etures

01000200030004000

200920102014

Page 10: Cloud Computing:  hadoop  Security Design - 2009

Security Challenges of the Cloud Where is my data living?

You may not know where you data is exactly since the data can be distributed among many physical disks

Where is my data going? In the cloud, especially in map/reduce, data is

constantly in moving from node to node and nodes may be across multiple mini-clouds

Who has access to my data? There may be other clients using the cloud,

as well as, administrators and others who maintained the cloud that could have access to the data if it is not properly protected.

Page 11: Cloud Computing:  hadoop  Security Design - 2009

Hadoop Security Concerns

Hadoop services do not authenticate users or other services. (a)  A user can access an HDFS or MapReduce cluster as

any other user. This makes it impossible to enforce access control in an uncooperative environment. For example, file permission checking on HDFS can be easily circumvented.

(b)  An attacker can masquerade as Hadoop services. For example, user code running on a MapReduce cluster can register itself as a new TaskTracker.

DataNodes do not enforce any access control on accesses to its data blocks. This makes it possible for an unauthorized client to read a data block as long as she can supply its block ID. It’s also possible for anyone to write arbitrary data blocks to DataNodes.

Page 12: Cloud Computing:  hadoop  Security Design - 2009

Security Requirements for Hadoop Users are only allowed to access HDFS files

that they have permission to access. Users are only allowed to access or modify

their own MapReduce jobs. User to service mutual authentication to

prevent unauthorized NameN- odes, DataNodes, JobTrackers, or TaskTrackers.

Service to service mutual authentication to prevent unauthorized services from joining a cluster’s HDFS or MapReduce service.

The degradation of performance should be no more than 3%.

Page 13: Cloud Computing:  hadoop  Security Design - 2009

Proposed Solution – Use Case 1Accessing Data

1) User/App requests access to a data block.

2) Name Node authenticates and gives the user a block token.

3) User/App uses block token on Data Node to access block for READ, WRITE, COPY or REPLACE.

Page 14: Cloud Computing:  hadoop  Security Design - 2009

Proposed Solution – Use Case 2Submitting Jobs

1) A user may obtain a delegation token through Kerberos.

2) Token given to user jobs for subsequent authentication to NameNode as the user.

3) Jobs can use the delegation token to access data that user/app has access to

Page 15: Cloud Computing:  hadoop  Security Design - 2009

Core Principles Analysis

Users/Apps will only have access to the data blocks they should have via block tokens

PassConfidentiality Analysis

Page 16: Cloud Computing:  hadoop  Security Design - 2009

Core Principles Analysis

Data is only available at the block level if the block token matches.

There is an assumption that the data is good because the blocks are not checked

Pass

Fail

Integrity Analysis

Page 17: Cloud Computing:  hadoop  Security Design - 2009

Core Principles Analysis

Job Tracker and Name Nodes are single points of failure for system.

Tokens persist for a small period of time so the system is resilient to short outages of Name Node and Job Tracker

Fail

Pass

Availability Analysis

Page 18: Cloud Computing:  hadoop  Security Design - 2009

Conclusion

The token method for authentication for both data and process access makes sense in a highly distributed system like hadoop. However, the fact that tokens have so much power and are not constantly re-checked leaves this design open to very serious TOCTOU attacks.

As compared to the currently model(aka no security) this represents a major step forward.

Page 19: Cloud Computing:  hadoop  Security Design - 2009

THE END

Questions?