shug meetup hops hadoop

33
Multi-Tenant Hadoop-as-a-Service (for free!) Jim Dowling Associate Prof @ KTH Senior Researcher @ SICS CEO @ Hops AB SHUG Meetup, Stockholm, April 21 st 2016 www.hops.io @hopshadoop (Some Slides by Prof. Tor Björn Minde, CEO SICS North Swedish ICT AB)

Upload: jim-dowling

Post on 13-Apr-2017

48 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Shug meetup Hops Hadoop

Multi-Tenant Hadoop-as-a-Service (for free!)

Jim Dowling Associate Prof @ KTH

Senior Researcher @ SICSCEO @ Hops AB

SHUG Meetup, Stockholm, April 21st 2016

www.hops.io @hopshadoop

(Some Slides by Prof. Tor Björn Minde, CEO SICS North Swedish ICT AB)

Page 2: Shug meetup Hops Hadoop
Page 3: Shug meetup Hops Hadoop

3

Talk Overview

• World’s First Open Data Centre for Big Data in Luleå

• Metadata in Hadoop

• True Multi-Tenancy for Hadoop

• DEMO: Spark/Flink/Hadoop-as-a-Service

Page 4: Shug meetup Hops Hadoop

4

Vision SICS ICE research facilityA 2 MW datacenter research and test environment

Purpose: Increase knowledge, strengthen universities, companies and researchers

R&D institute, 5 lab modules, 3-4000 servers, 2-3000 square meters

Page 5: Shug meetup Hops Hadoop

5

What SICS ICE will offer1. Compute capacity and tools for big data and cloud

• Hadoop/Spark/Flink-as-a-Service

2. Demonstration space for new products & solutions

3. Datacenter infrastructure for experiments and facility data• Flexible lab modules and re-configuration• Measurement equipment for energy, cooling, capacity

4. Competence for verticals and datacenter infrastructure

Page 6: Shug meetup Hops Hadoop

6

Status of SICS-ICE research facility(ICE = Infrastructure and Cloud research Environment)

Phase 1 (1 room built)• Establish test projects in a “room-in-

room” commercial co-location facility • Start of operation February 2016• Officially Launched in April 2016

Phase 2 (Design phase) • Design of a flexible and general research

facility summer-fall 2016• Contracts with Akademiska Hus & E.ON• Plan is to start build phase Spring 2017• Plan is to start installation fall 2017• Plan is to start operation early 2018

Page 7: Shug meetup Hops Hadoop

7

Phase 1 room-in-room module 1

Page 8: Shug meetup Hops Hadoop

8

A Data Center Optimized for Hadoop

Dell servers from Hi5 in module 1

• 3600 cores• 40 TB RAM• Up to 7.5 petabyte storage• 10/40 Gb/s network• Separate management network

Page 9: Shug meetup Hops Hadoop

Hadoop-as-a-Service on SICS ICE

9

Page 10: Shug meetup Hops Hadoop

But First…. MetaData in Hadoop

10

Page 11: Shug meetup Hops Hadoop

Metadata Totem Poles in Hadoop

11Eventual Consistency

Page 12: Shug meetup Hops Hadoop

12

With Many Hadoop Clusters

Cluster 1 Cluster N

MetaDataService

MetaDataService

MetaData Service (Aggregator)

Eventually consistent MetaData aggregated using moreeventually consistent protocols.

Page 13: Shug meetup Hops Hadoop

MetaData in Hops Hadoop

HDFSYARN

NDB

ProjectsDataSets

Users

ProvenanceSearch

HistoryCustomMetaData

13

Page 14: Shug meetup Hops Hadoop

Case Study: Access Control as a MetaData Service

14

Page 15: Shug meetup Hops Hadoop

15

Access Control in Relational Databases# Multi-tenancy for alice and bob on db1 and db2

grant all privileges on db1.* to ‘alice'@‘%‘;grant all privileges on db2.* to ‘bob'@‘%‘;

#More fine-grained privilegesgrant SELECT privileges on db2.sensitiveTable to ‘alice'@‘192.168.1.2‘;

Databases ensure the consistency of security and policies using foreign keys.

“drop table db2.sensitiveTable” => delete associated privileges

Page 16: Shug meetup Hops Hadoop

16

Access Control in Hadoop: Apache Sentry

How do you ensure the consistency of the policies and the data?

[Mujumdar’15]

Page 17: Shug meetup Hops Hadoop

17

Policy Editor for Sentry

Administrators administer privileges for users

Page 18: Shug meetup Hops Hadoop

18

Problem: Sensitive Data needs its own Cluster

NSA DataSet

User DataSet

has access to

has access to

Alice can copy/cross-link between data sets

Alice has only one Kerberos Identity. Neither attribute-based access control nor dynamic roles supported in Hadoop.

Alice

Page 19: Shug meetup Hops Hadoop

19

Solution: Project-Specific UserIDs

Project NSA

Project UsersMember of

NSA__Alice

Users__Alice

Member of

HDFS enforcesaccess control

How can we share DataSets between Projects?

Page 20: Shug meetup Hops Hadoop

20

Sharing DataSets with HopsWorks

Project NSA

Project UsersMember of

DataSetowns

Add members of Project NSA to the DataSet group

NSA__Alice

Users__Alice

Member of

Page 21: Shug meetup Hops Hadoop

Web Application Enforces Dynamic Roles

21

[email protected]

NSA__Alice

Authenticate

Users__Alice

HopsWorks

HopsFS

HopsYARN

Projects

SecureImpersonation

Page 22: Shug meetup Hops Hadoop

22

User• Authentication Provider

- JDBC Realm- 2-Factor Authentication- LDAP

Page 23: Shug meetup Hops Hadoop

23

Project• Users

- Roles: Owner, Data Scientist

• DataSets - Home project- Can be shared

Page 24: Shug meetup Hops Hadoop

24

Project Roles• Data Owner Privileges

- Import/Export data- Manage Membership- Share DataSets

• Data Scientist Privileges- Write code- Run code- Request access to DataSets

We delegate administration of privileges to users

Page 25: Shug meetup Hops Hadoop

25

Per Project CPU and Storage Quotas• 300 GB per Project

• 1000 CPU mins

• Uber-Style Pricing- Elastic Demand Curve

Page 26: Shug meetup Hops Hadoop

27

Sharing DataSets between Projects

The same as Sharing Folders in Dropbox

Page 27: Shug meetup Hops Hadoop

28

Delegate Access Control to HDFS• HDFS enforces access control- UserID per Project- GroupID per

Project and DataSet

• Metadata Integrity using Foreign Keys- Removing a project removes

all users, groups, extended metadata, and (optionally) DataSets.

Page 28: Shug meetup Hops Hadoop

29

Free Text Search with Consistent Metadata

Free-Text Search

Distributed DatabaseElasticSearch

The Distributed Database is the Single Source of Truth.Foreign keys ensure the integrity of Metadata.

MetaDataDesigner

MetaDataEntry

Page 29: Shug meetup Hops Hadoop

30

The NoteBook Proxy Wars

Page 30: Shug meetup Hops Hadoop

Demo

31

Page 31: Shug meetup Hops Hadoop

32

Short-Term RoadMap• Multi-tenant Kafka

- Per-project Topics

• Oozie Workflow Editor

• Genomics Support with Adam/Spark

• Tiered Storage: Hot Data, Normal, Archived

• Improved Data Ingress- Sharing Public DataSets Globally using P2P technology

Page 32: Shug meetup Hops Hadoop

The TeamActive: Jim Dowling, Seif Haridi, Tor Björn Minde,

Gautier Berthou, Salman Niazi, Mahmoud Ismail,Kamal Hakimzadeh, Ermias Gebremeskel, Theofilos Kakantousis, Johan Svedlund Nordström, Someya Sayeh, Vasileios Giannokostas, Antonios Kouzoupis, Misganu Dessalegn, Rizvi Hasan,Ahmad Al-Shishtawy, Ali Gholami, Paul Mälzer.

Alumni: K. “Sri” Srijeyanthan, Steffen Grohsschmiedt, Alberto Lorente, Andre Moré, Stig Viaene, Hooman Peiro, Evangelos Savvidis, Jude D’Souza, Qi Qi, Gayana Chandrasekara,Nikolaos Stanogias, Daniel Bali, Ioannis Kerkinos,Peter Buechler, Pushparaj Motamari, Hamid Afzali,Wasif Malik, Lalith Suresh, Mariano Valles, Ying Lieu.

Hops

Page 33: Shug meetup Hops Hadoop

34

Conclusions• HopsWork is providing a world’s first: Hadoop-as-a-Service to researchers and industry.

• Workshop on 12th May, 17.30 – 20.00 in SICS, 6th Floor of the Electrum Building, Kista.Register at www.hops.io/?q=news

• Join the team – talk to me!

www.hops.iowww.hops.site