shug meetup hops hadoop

Post on 13-Apr-2017

48 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Multi-Tenant Hadoop-as-a-Service (for free!)

Jim Dowling Associate Prof @ KTH

Senior Researcher @ SICSCEO @ Hops AB

SHUG Meetup, Stockholm, April 21st 2016

www.hops.io @hopshadoop

(Some Slides by Prof. Tor Björn Minde, CEO SICS North Swedish ICT AB)

3

Talk Overview

• World’s First Open Data Centre for Big Data in Luleå

• Metadata in Hadoop

• True Multi-Tenancy for Hadoop

• DEMO: Spark/Flink/Hadoop-as-a-Service

4

Vision SICS ICE research facilityA 2 MW datacenter research and test environment

Purpose: Increase knowledge, strengthen universities, companies and researchers

R&D institute, 5 lab modules, 3-4000 servers, 2-3000 square meters

5

What SICS ICE will offer1. Compute capacity and tools for big data and cloud

• Hadoop/Spark/Flink-as-a-Service

2. Demonstration space for new products & solutions

3. Datacenter infrastructure for experiments and facility data• Flexible lab modules and re-configuration• Measurement equipment for energy, cooling, capacity

4. Competence for verticals and datacenter infrastructure

6

Status of SICS-ICE research facility(ICE = Infrastructure and Cloud research Environment)

Phase 1 (1 room built)• Establish test projects in a “room-in-

room” commercial co-location facility • Start of operation February 2016• Officially Launched in April 2016

Phase 2 (Design phase) • Design of a flexible and general research

facility summer-fall 2016• Contracts with Akademiska Hus & E.ON• Plan is to start build phase Spring 2017• Plan is to start installation fall 2017• Plan is to start operation early 2018

7

Phase 1 room-in-room module 1

8

A Data Center Optimized for Hadoop

Dell servers from Hi5 in module 1

• 3600 cores• 40 TB RAM• Up to 7.5 petabyte storage• 10/40 Gb/s network• Separate management network

Hadoop-as-a-Service on SICS ICE

9

But First…. MetaData in Hadoop

10

Metadata Totem Poles in Hadoop

11Eventual Consistency

12

With Many Hadoop Clusters

Cluster 1 Cluster N

MetaDataService

MetaDataService

MetaData Service (Aggregator)

Eventually consistent MetaData aggregated using moreeventually consistent protocols.

MetaData in Hops Hadoop

HDFSYARN

NDB

ProjectsDataSets

Users

ProvenanceSearch

HistoryCustomMetaData

13

Case Study: Access Control as a MetaData Service

14

15

Access Control in Relational Databases# Multi-tenancy for alice and bob on db1 and db2

grant all privileges on db1.* to ‘alice'@‘%‘;grant all privileges on db2.* to ‘bob'@‘%‘;

#More fine-grained privilegesgrant SELECT privileges on db2.sensitiveTable to ‘alice'@‘192.168.1.2‘;

Databases ensure the consistency of security and policies using foreign keys.

“drop table db2.sensitiveTable” => delete associated privileges

16

Access Control in Hadoop: Apache Sentry

How do you ensure the consistency of the policies and the data?

[Mujumdar’15]

17

Policy Editor for Sentry

Administrators administer privileges for users

18

Problem: Sensitive Data needs its own Cluster

NSA DataSet

User DataSet

has access to

has access to

Alice can copy/cross-link between data sets

Alice has only one Kerberos Identity. Neither attribute-based access control nor dynamic roles supported in Hadoop.

Alice

19

Solution: Project-Specific UserIDs

Project NSA

Project UsersMember of

NSA__Alice

Users__Alice

Member of

HDFS enforcesaccess control

How can we share DataSets between Projects?

20

Sharing DataSets with HopsWorks

Project NSA

Project UsersMember of

DataSetowns

Add members of Project NSA to the DataSet group

NSA__Alice

Users__Alice

Member of

Web Application Enforces Dynamic Roles

21

Alice@gmail.com

NSA__Alice

Authenticate

Users__Alice

HopsWorks

HopsFS

HopsYARN

Projects

SecureImpersonation

22

User• Authentication Provider

- JDBC Realm- 2-Factor Authentication- LDAP

23

Project• Users

- Roles: Owner, Data Scientist

• DataSets - Home project- Can be shared

24

Project Roles• Data Owner Privileges

- Import/Export data- Manage Membership- Share DataSets

• Data Scientist Privileges- Write code- Run code- Request access to DataSets

We delegate administration of privileges to users

25

Per Project CPU and Storage Quotas• 300 GB per Project

• 1000 CPU mins

• Uber-Style Pricing- Elastic Demand Curve

27

Sharing DataSets between Projects

The same as Sharing Folders in Dropbox

28

Delegate Access Control to HDFS• HDFS enforces access control- UserID per Project- GroupID per

Project and DataSet

• Metadata Integrity using Foreign Keys- Removing a project removes

all users, groups, extended metadata, and (optionally) DataSets.

29

Free Text Search with Consistent Metadata

Free-Text Search

Distributed DatabaseElasticSearch

The Distributed Database is the Single Source of Truth.Foreign keys ensure the integrity of Metadata.

MetaDataDesigner

MetaDataEntry

30

The NoteBook Proxy Wars

Demo

31

32

Short-Term RoadMap• Multi-tenant Kafka

- Per-project Topics

• Oozie Workflow Editor

• Genomics Support with Adam/Spark

• Tiered Storage: Hot Data, Normal, Archived

• Improved Data Ingress- Sharing Public DataSets Globally using P2P technology

The TeamActive: Jim Dowling, Seif Haridi, Tor Björn Minde,

Gautier Berthou, Salman Niazi, Mahmoud Ismail,Kamal Hakimzadeh, Ermias Gebremeskel, Theofilos Kakantousis, Johan Svedlund Nordström, Someya Sayeh, Vasileios Giannokostas, Antonios Kouzoupis, Misganu Dessalegn, Rizvi Hasan,Ahmad Al-Shishtawy, Ali Gholami, Paul Mälzer.

Alumni: K. “Sri” Srijeyanthan, Steffen Grohsschmiedt, Alberto Lorente, Andre Moré, Stig Viaene, Hooman Peiro, Evangelos Savvidis, Jude D’Souza, Qi Qi, Gayana Chandrasekara,Nikolaos Stanogias, Daniel Bali, Ioannis Kerkinos,Peter Buechler, Pushparaj Motamari, Hamid Afzali,Wasif Malik, Lalith Suresh, Mariano Valles, Ying Lieu.

Hops

34

Conclusions• HopsWork is providing a world’s first: Hadoop-as-a-Service to researchers and industry.

• Workshop on 12th May, 17.30 – 20.00 in SICS, 6th Floor of the Electrum Building, Kista.Register at www.hops.io/?q=news

• Join the team – talk to me!

www.hops.iowww.hops.site

top related