data lake organization -...

25
Data Lake Organization A Hadoop Eco-System Jan Cordtz, Microsoft Denmark [email protected] Cloud Solution Architect

Upload: others

Post on 29-May-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Lake Organization - azurebootcampdk.comazurebootcampdk.com/Presentations/DataLake-Organize.pdf · Data Lake Analytics Service IoT Hub Data Catalog Power BI Embedded Data Lake

Data Lake OrganizationA Hadoop Eco-System

Jan Cordtz, Microsoft Denmark

[email protected]

Cloud Solution Architect

Page 2: Data Lake Organization - azurebootcampdk.comazurebootcampdk.com/Presentations/DataLake-Organize.pdf · Data Lake Analytics Service IoT Hub Data Catalog Power BI Embedded Data Lake

42Azure regions

Hyper scale

Infrastructure

100+ Data-

centers across

42 Regions

Worldwide

Learn more: Microsoft.com/datacenter

Top 3 networks in the world

Page 3: Data Lake Organization - azurebootcampdk.comazurebootcampdk.com/Presentations/DataLake-Organize.pdf · Data Lake Analytics Service IoT Hub Data Catalog Power BI Embedded Data Lake

AzureSearch

HybridCloud

Backup

StorSimple

Azure SiteRecovery

Import/Export

Azure AD Health Monitoring

AD PrivilegedIdentity Management

OperationalAnalytics

Domain Services

SQL Database DocumentDB

Redis Cache

StorageTables

SQL DataWarehouse

SQL Server Stretch Database

Visual Studio

ApplicationInsights

VS Team ServicesXamarin

HockeyApp

MobileEngagement

Cognitive Services Bot Framework Cortana

Security & Management

Azure ActiveDirectory

Multi-FactorAuthentication

Automation

Portal

Key Vault

Store/Marketplace

VM Image Gallery& VM Depot

Azure ADB2C

Scheduler

Security Center

WebApps

MobileApps

API Apps

Notification Hubs

Cloud Services

ServiceFabric

Functions

BatchRemoteApp

Container Service

VM Scale Sets

BizTalkServices

Service Bus

Logic Apps

API Management

Content DeliveryNetwork

Media Services

Media Analytics

HDInsight MachineLearning Stream Analytics

Data Factory

EventHubs

Data LakeAnalytics Service

IoT Hub

Data Catalog

Power BI Embedded

Data Lake Store

Data Center

Infrastructure as a Service

Platform as a Service

Page 4: Data Lake Organization - azurebootcampdk.comazurebootcampdk.com/Presentations/DataLake-Organize.pdf · Data Lake Analytics Service IoT Hub Data Catalog Power BI Embedded Data Lake

TrustedHIPAA /

HITECH Act

FERPAGxP

21 CFR Part 11

ISO 27001 SOC 1 Type 2ISO 27018

CSA STAR

Self-Assessment

Singapore

MTCS

UK

G-Cloud

Australia

IRAP/CCSL

FISC Japan

New Zealand

GCIO

China

GB 18030

EU

Model Clauses

ENISA

IAF

Argentina

PDPA

Japan CS

Mark Gold

CDSA Shared

Assessments

Japan My

Number Act

FACT UK

GLBA

Spain

ENS

PCI DSS

Level 1

MARS-E FFIEC

China

TRUCS

SOC 2 Type 2 SOC 3

Canada

Privacy Laws

MPAA

Privacy

Shield

ISO 22301

India

MeitY

Germany IT

Grundschutz

workbook

Spain

DPA

CSA STAR

Certification

CSA STAR

Attestation

HITRUST IG Toolkit UK

China

DJCP

ISO 27017

GLO

BA

LIN

DU

ST

RY

REG

ION

AL

More certifications than any cloud provider

Page 5: Data Lake Organization - azurebootcampdk.comazurebootcampdk.com/Presentations/DataLake-Organize.pdf · Data Lake Analytics Service IoT Hub Data Catalog Power BI Embedded Data Lake
Page 6: Data Lake Organization - azurebootcampdk.comazurebootcampdk.com/Presentations/DataLake-Organize.pdf · Data Lake Analytics Service IoT Hub Data Catalog Power BI Embedded Data Lake

AnalyticsData Cloud

How to (easily) disrupt a Data Warehouse

Page 7: Data Lake Organization - azurebootcampdk.comazurebootcampdk.com/Presentations/DataLake-Organize.pdf · Data Lake Analytics Service IoT Hub Data Catalog Power BI Embedded Data Lake

Digital transformation/disruption

Planning is dead

Page 8: Data Lake Organization - azurebootcampdk.comazurebootcampdk.com/Presentations/DataLake-Organize.pdf · Data Lake Analytics Service IoT Hub Data Catalog Power BI Embedded Data Lake

Open and hybrid

Page 9: Data Lake Organization - azurebootcampdk.comazurebootcampdk.com/Presentations/DataLake-Organize.pdf · Data Lake Analytics Service IoT Hub Data Catalog Power BI Embedded Data Lake

How to use and organize….

9

Page 10: Data Lake Organization - azurebootcampdk.comazurebootcampdk.com/Presentations/DataLake-Organize.pdf · Data Lake Analytics Service IoT Hub Data Catalog Power BI Embedded Data Lake

How to do ?

Innovation - Mode 2

TestPre-Prod

Prod

Functionality

Data Lake

Functionality

SQL Cube

Dev

Repository like VSTS

SQLSQLCubeCube

Blob/Disc

Data Inges-tion

Page 11: Data Lake Organization - azurebootcampdk.comazurebootcampdk.com/Presentations/DataLake-Organize.pdf · Data Lake Analytics Service IoT Hub Data Catalog Power BI Embedded Data Lake

Storage – from a functionality point of view

11

File storage Database CubeData Lake

Functionality and cost

ETLData Factory

ETLData Factory

Page 12: Data Lake Organization - azurebootcampdk.comazurebootcampdk.com/Presentations/DataLake-Organize.pdf · Data Lake Analytics Service IoT Hub Data Catalog Power BI Embedded Data Lake

Principal regarding the Organization

• Is very simple to use for an end-user/application

• Is as cost-effective as sensible/possible.

• Do not compromise security.

• Fits well into a DevOps scenario

• Supports both of Gartner’s mode 1 and mode 2 implementation

scenarios.

• Have a well-defined path for the information needed to be able

to support an effective auditing and logging process.

Page 13: Data Lake Organization - azurebootcampdk.comazurebootcampdk.com/Presentations/DataLake-Organize.pdf · Data Lake Analytics Service IoT Hub Data Catalog Power BI Embedded Data Lake

Copy

Organizing the Azure Data Lake

13

Azure Data Lake

Landing Zone

Work

Landing Zone System Account(s)

– read/write

Work System Account(s)

- read

Work System Account(s)

– read/write

Publish

Users in Groups

Read/Write

Read Only – except Work System Account(s)

Folder per ”something”

Analytics

Users in Groups

Read Only – except Work System Account(s)

Read/Write”All data”

Transform

Transform &

Anonymize

Archive

Data Catalog

Data Inges-tion

Page 14: Data Lake Organization - azurebootcampdk.comazurebootcampdk.com/Presentations/DataLake-Organize.pdf · Data Lake Analytics Service IoT Hub Data Catalog Power BI Embedded Data Lake

Data Ingestion

”Gatekeeper”

Validation

Standardization SSISData Factory

DatabaseFTP

File Storage

FirewallAD control

Items like : Date formats (yyyymmdd),

number formats (,. or .,)

”Are you allowed to come in ?”

”Is the content you are coming with in

accordance with what we have agreed”

Page 15: Data Lake Organization - azurebootcampdk.comazurebootcampdk.com/Presentations/DataLake-Organize.pdf · Data Lake Analytics Service IoT Hub Data Catalog Power BI Embedded Data Lake

Organize an Area in the lakeExample ”by deparment”

15

Publish

Users in Groups

Read/Write

Read Only

Directory per ”department”

Org

aniz

atio

n t

op

leve

l

Department A Function Area A

Department B

Department C

Use

r le

vel User A Own Directory A

User B

User C

Read Only

Read/Write

Page 16: Data Lake Organization - azurebootcampdk.comazurebootcampdk.com/Presentations/DataLake-Organize.pdf · Data Lake Analytics Service IoT Hub Data Catalog Power BI Embedded Data Lake

16

Data lake Data lake

DW DW

Mirror

Prod Dev/test

Data lake

DW DW

Prod Dev/test

Data lake

Data lake

DW DW

Prod Dev/test

A

B

C

Page 17: Data Lake Organization - azurebootcampdk.comazurebootcampdk.com/Presentations/DataLake-Organize.pdf · Data Lake Analytics Service IoT Hub Data Catalog Power BI Embedded Data Lake

What is Azure Data Lake then….

17

Page 18: Data Lake Organization - azurebootcampdk.comazurebootcampdk.com/Presentations/DataLake-Organize.pdf · Data Lake Analytics Service IoT Hub Data Catalog Power BI Embedded Data Lake

• d

Built on Open Standards

Built on YARN

Store lets all HDFS compliant analytic applications

connect to it like Hortonworks, Cloudera, and

MapR

HDInsight is 100% Apache Hadoop

Microsoft continues to contribute tens of thousands

of code and engineering hours to open source

HDFS

YARN

U-SQL

Analytics

ServiceHDInsight

HDFS

Store

Page 19: Data Lake Organization - azurebootcampdk.comazurebootcampdk.com/Presentations/DataLake-Organize.pdf · Data Lake Analytics Service IoT Hub Data Catalog Power BI Embedded Data Lake

Any type of analytics: batch, streaming, interactive

Batch, interactive, streaming, machine learning

Allows for exploratory analytics over your data

Do analytics with Hadoop and Microsoft solutions

Azure Data Lake analytics

Page 20: Data Lake Organization - azurebootcampdk.comazurebootcampdk.com/Presentations/DataLake-Organize.pdf · Data Lake Analytics Service IoT Hub Data Catalog Power BI Embedded Data Lake

Data stored of any size, optimized for high performance

Store has no fixed limits on file sizes

(PB sized files)

Ultra fast read/write access

No code rewrite as you increase size of data stored

Optimized for large analytic systems: with massive throughput

Optimized for IOT with high volume of small writes

PB

TB GB

PBTB

Page 21: Data Lake Organization - azurebootcampdk.comazurebootcampdk.com/Presentations/DataLake-Organize.pdf · Data Lake Analytics Service IoT Hub Data Catalog Power BI Embedded Data Lake

U-SQL

Be productive with U-SQL, a simple and powerful language

Simple and familiar, easily extensible

Unifies declarative nature of SQL with expressive power of

C#

Familiar syntax to millions of .NET developers

Page 22: Data Lake Organization - azurebootcampdk.comazurebootcampdk.com/Presentations/DataLake-Organize.pdf · Data Lake Analytics Service IoT Hub Data Catalog Power BI Embedded Data Lake

Manage and secure your data assets

Auditing, alerting, access control - all from within a single

web-based portal

Azure Active Directory integration for identity and access

management

Page 23: Data Lake Organization - azurebootcampdk.comazurebootcampdk.com/Presentations/DataLake-Organize.pdf · Data Lake Analytics Service IoT Hub Data Catalog Power BI Embedded Data Lake

Cortana Intelligence Suite

2

3

Page 24: Data Lake Organization - azurebootcampdk.comazurebootcampdk.com/Presentations/DataLake-Organize.pdf · Data Lake Analytics Service IoT Hub Data Catalog Power BI Embedded Data Lake

Microsoft Data solution Overview (CIS)

People

Automated Systems

Apps

Web

Mobile

Bots

Intelligence

Capabilities

Dashboards &

Visualizations

Cortana

Bot

Framework

Cognitive

Services

Power BI

Data Sources

Apps

Sensors and devices

Machine Learning

and Analytics

HDInsight

(Hadoop and

Spark)

Stream Analytics

Data Lake

Analytics

Machine

Learning

ActionData Intelligence

Information

Management

Event Hubs

Data Catalog

Data Factory

Big Data Stores

SQL Data

Warehouse

Data Lake Store

Page 25: Data Lake Organization - azurebootcampdk.comazurebootcampdk.com/Presentations/DataLake-Organize.pdf · Data Lake Analytics Service IoT Hub Data Catalog Power BI Embedded Data Lake

Thank you