big data & advanced analytics - managed services“ on...

18
Big Data & Advanced Analytics - "managed Services“ on Azure Guido Jacobs Global Black Belt – TSP Big Data [email protected] Microsoft Deutschland GmbH

Upload: doankhuong

Post on 07-Feb-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

Big Data & Advanced Analytics -"managed Services“ on Azure

Guido Jacobs

Global Black Belt – TSP Big Data

[email protected]

Microsoft Deutschland GmbH

“Hyper scale” Infrastruktur das ist Azure!

Central US

Iowa

West US

California

East US

Virginia

US Gov

Virginia

North Central US

Illinois

US Gov

Iowa

South Central US

Texas

Brazil South

Sao Paulo State

West Europe

Netherlands

China North *

Beijing

China South *

Shanghai

Japan West

Osaka

India South

Chennai

East Asia

Hong Kong

SE Asia

Singapore

India Central

Pune

Japan East

Tokyo, Saitama

Australia East

New South Wales

Australia South East

Victoria

Canada East

Quebec City

Canada Central

Toronto

India West

Mumbai

Germany North East **

Magdeburg

Germany Central **

FrankfurtNorth Europe

Ireland

East US 2

Virginia

United Kingdom

Regions

http://azure.microsoft.com/en-us/regions/

US DoD West

TBD

US DoD East

TBD

West US 2

California

West Central US

Korea Central

Seoul

Korea South

Hadoop im traditionellen Sinne

N1

N2

Hadoop im traditionellen Sinne

N1

N2

N1

N3

N2

N4

N1

N3

N2

N4

N1

N3

N2

N4

N5

N6

N7

N8

Flexibilität in der CloudCONTROL EASE OF USE

Azure Data Lake

Analytics

Azure Data Lake Store

Azure Storage

Any Hadoop technology

Workload optimized,

managed clusters

Specific apps in a multi-

tenant form factorAzure Marketplace

HDP | CDH | MapR

Azure Data Lake

Analytics

IaaS Hadoop Managed Hadoop Big Data as-a-service

Azure HDInsight

BIG

DA

TA

S

TO

RA

GE

BIG

DA

TA

A

NA

LY

TIC

S

Use

r A

do

pti

on

Azure Data Lake StoreA hyper-scale repository for Big Data analytics workloads

Hadoop File System (HDFS) for the cloud

No limits to scale

Store any data in its native format

Enterprise-grade access control,

encryption at rest

Optimized for analytic workload performance

*IDC study “The Business Value and TCO Advantage of Apache Hadoop in the Cloud with Microsoft Azure HDInsight”

Azure HDInsightHadoop and Spark as a Service on Azure

Fully-managed Hadoop and Spark

for the cloud

100% Open Source Hortonworks

data platform

Clusters up and running in minutes

Managed, monitored and supported

by Microsoft with the industry’s best SLA

Familiar BI tools for analysis, or open source

notebooks for interactive data science

63% lower TCO than deploy your own

Hadoop on-premises*

AzureData Lake AnalyticsA new distributed analytics service

Distributed analytics service built on

Apache YARN

Elastic scale per query lets users focus on

business goals—not configuring hardware

Includes U-SQL—a language that unifies the

benefits of SQL with the expressive

power of C#

Integrates with Visual Studio to develop,

debug, and tune code faster

Federated query across Azure data sources

Enterprise-grade role based access control

Daten einfach kombinieren

Vorteile

• Große Datenmengen müssen NICHT zwischen unterschiedlichen Speichern verschoben werden

• Einheitliche Sicht auf Daten unabhängig vom physikalischen Speicherplatz

• Verringerung der Datenpflege durch weniger Kopien

• Eine Abfragesprache für ALLE Datenquellen

• Jeder Datenspeicher behält seine Souveränität

• Bedarfsbezogenen Lösungsdesign

• SQL Prädikate werden an die SQL-Quellen gesendet

• Filters

• Joins

U-SQL

Query

Query

Azure Storage Blobs

SQL in Azure VMs

Azure SQL DB

Azure Data

Lake Analytics

Azure SQL Data Warehouse

Azure Data Lake Storage

Trennung von Storage & Compute

Azure Data

Lake Store

ADL-A/HDI

SQL DB PowerBIData

ADL-A/HDI

Trennung von Storage & Compute (2)

Azure Data Lake Store

ADL-A/HDI

SQL DW PowerBIData

ADL-A/HDI ADL-A/HDI

ADL-A/HDIHDI/

R-ServerHDI/

H2O AI

What is Azure Data Catalog?

An enterprise-wide directory in Azure that enables self-service discovery of data from any source

A metadata repository that allow users to register, enrich,understand, discover, and consume data sources

An enterprise-wide catalog in Azure that enables self-service discovery of data from any source

Perimeter Level SecurityVirtual NetworksNetwork Security Groups (Firewalls)

AuthenticationKerberosAzure Active Directory

AuthorizationApache RangerRBAC for AdminPOSIX ACLs for Data Plane

Data SecurityServer-Side encryption at restHTTPS/ TLS in-transit

Enterprise grade Security in HDInsight

HDInsight - 3rd Party Solutions

• H2O – Sparkling Water:https://www.h2o.ai/sparkling-water/

• Datameer:https://www.datameer.com/

• Data iku - Data Science Studio:https://www.dataiku.com/

• Cask Data App Platform (CDAP):http://cask.co/products/cdap/

• StreamSets Data Collector:https://streamsets.com/products/sdc/

• Spark Job Server for KNIME Spark Executor:https://www.knime.org/knime-spark-executor

• SnapLogic Hadooplex:https://snaplogic.com/solutions/microsoft-cortana-analytics-integration

On-premise HDP Cluster

AzureBig Data Storage/

Azure Data Lake Store

Optimized for MMP based Analytical Workloads

Authorized Accessby Azure AD

Access via:• ADL:// (Oauth2)• WebHDFS (Oauth2)

No upfront costNo pre-allocationPay-for-stored-Data

Shared Meta-Data

RANGER

HIVE

OOZIE

HN HN

WN WN …

RAM & CPU are configured to fulfill the workload requirements

WN

HDInsight ClusterType: R-Server

HN HN

WN WN …

RAM & CPU are configured to fulfill the workload requirements

WN

HDInsight ClusterType: Spark

HN HN

WN WN …

RAM & CPU are configured to fulfill the workload requirements

WN

HDInsight ClusterType: 3rd Party

HN HN

WN WN …WN

HDP (IaaS)Type: Cloudbreak

HN

HDFS (only for temp & spilling data)

Edge-Node

3rd PartyComponents

Synchronisation on File-level

© 2016 Microsoft Corporation. All rights reserved.

The Business Value and TCO of HDInsight

• 418% 5-year ROI

• Four month payback period

• 63% 5-year lower TCO than on-premises

• 66% staff efficiencies than on-premises

• Get it at http://aka.ms/hdinsight

*IDC study “The Business Value and TCO Advantage of Apache Hadoop in the Cloud with Microsoft Azure HDInsight”

RESOURCES ACROSS THE SALES CYCLE