hadoop und sas status und ausblick · title: business-breakfast-hadoop-gernot-engel author:...

24
Copyright © 2012, SAS Institute Inc. All rights reserved. Copyright © 2012, SAS Institute Inc. All rights reserved. WIEN, JUNI 2015 GERNOT ENGEL, CLIENT SERVICE MANAGER SAS AUSTRIA Hadoop und SAS Status und Ausblick

Upload: buikhanh

Post on 17-Sep-2018

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hadoop und SAS Status und Ausblick · Title: business-breakfast-hadoop-gernot-engel Author: Katharina Wismayr Subject: hadoop presentation Created Date: 6/17/2015 3:29:47 PM

C o p y r i g h t © 2 0 1 2 , S A S In s t i t u t e In c . A l l r i g h t s r e s e r ve d .Copyr i gh t © 2012, SAS Ins t i tu te Inc . A l l r i gh ts reserved .

WIEN, JUNI 2015

GERNOT ENGEL,

CLIENT SERVICE MANAGER

SAS AUSTRIA

Hadoop und SAS Status

und Ausblick

Page 2: Hadoop und SAS Status und Ausblick · Title: business-breakfast-hadoop-gernot-engel Author: Katharina Wismayr Subject: hadoop presentation Created Date: 6/17/2015 3:29:47 PM

C o p y r i g h t © 2 0 1 2 , S A S In s t i t u t e In c . A l l r i g h t s r e s e r ve d .Copyr i gh t © 2012, SAS Ins t i tu te Inc . A l l r i gh ts reserved .

AGENDA

SAS & Hadoop Technologien, Lösungen

Demo SAS access to hadoop, SAS Dataloader for hadoop

SAS Hadoop Anwendungsszenarien & Ausblick

1

2

3

Page 3: Hadoop und SAS Status und Ausblick · Title: business-breakfast-hadoop-gernot-engel Author: Katharina Wismayr Subject: hadoop presentation Created Date: 6/17/2015 3:29:47 PM

C o p y r i g h t © 2 0 1 2 , S A S In s t i t u t e In c . A l l r i g h t s r e s e r ve d .Copyr i gh t © 2012, SAS Ins t i tu te Inc . A l l r i gh ts reserved .

SAS FOR HADOOP VISION

To be the Analytic and Data

Management solution of choice for

Hadoop.

Page 4: Hadoop und SAS Status und Ausblick · Title: business-breakfast-hadoop-gernot-engel Author: Katharina Wismayr Subject: hadoop presentation Created Date: 6/17/2015 3:29:47 PM

C o p y r i g h t © 2 0 1 2 , S A S In s t i t u t e In c . A l l r i g h t s r e s e r ve d .

HADOOP GRUNDLAGENKEINE GEGENSÄTZE… ABER OFT NICHT GEMEINSAM BETRACHTET!

Hadoop als

“Data Integration Platform”

…ist Baustein einer Transformation

der IT Landschaft

Hadoop als Kernkomponente einer

“next gen” BI- und Analytics-Strategie

…dient zur Unterstützung neuer

Fragestellungen in den Fachbereichen

ETLProcess

IDENTIFY /

FORMULATE

PROBLEM

DATA

PREPARATION

DATA

EXPLORATION

TRANSFORM

& SELECT

BUILD

MODEL

VALIDATE

MODEL

DEPLOY

MODEL

EVALUATE /

MONITOR

RESULTS

Wdh.

Page 5: Hadoop und SAS Status und Ausblick · Title: business-breakfast-hadoop-gernot-engel Author: Katharina Wismayr Subject: hadoop presentation Created Date: 6/17/2015 3:29:47 PM

Copyr i g ht © 2014, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

SAS & HADOOP BASIS TECHNOLOGIEN & PRODUKTE

SAS

Hive

SAS/Access to Hadoop Push

some SAS processing from

Hadoop into SAS

Embedded Process - Push

SAS data processing to

Hadoop with Map Reduce

SAS

Score A Code AImpala

In-Memory Analytics - Use

Hadoop for Storage persistence

and commodity computing.

SAS

HPA LASR

SAS/Access to Hadoop - Demo

SAS/Access to Cloudera Impala

SAS DI Server

SAS/Scoring Accelerator for Hadoop

SAS Code Accelerator for Hadoop *

SAS Data Quality Accelerator for Hadoop*

SAS Data Loader for Hadoop (*inkludiert) - Demo

SAS Visual Analytics

SAS Visual Statistics

SAS in memory Statistics

SAS HPA Produkt bundles

Page 6: Hadoop und SAS Status und Ausblick · Title: business-breakfast-hadoop-gernot-engel Author: Katharina Wismayr Subject: hadoop presentation Created Date: 6/17/2015 3:29:47 PM

Copyr i g ht © 2014, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

SAS & HADOOP DATENMANAGEMENT FÜR HADOOP MIT SAS

• Datenmanagement mit SAS

• “PROC HADOOP” (Map Reduce + Pig Scripting +

HDFS Kommandos)

• SAS Access to Hadoop

• Hive, Hive2, Impala

• Proc Pushdown: FREQ, RANK, REPORT, SORT,

SUMMARY/MEANS & TABULATE

• Hadoop Plugins für SAS Data Integration Studio

• SAS Data Loader

• Point & Click Datenmanagement für Hadoop:

Einlesen, Transformieren und Bereinigen von Daten

in Hadoop

• Highlights: SQOOP Integration, SAS Profiling und

Data Quality Engines, Transfer der Daten zu SAS In-

Memory Analytics Cluster

• HTML-basierendes Interface

Page 7: Hadoop und SAS Status und Ausblick · Title: business-breakfast-hadoop-gernot-engel Author: Katharina Wismayr Subject: hadoop presentation Created Date: 6/17/2015 3:29:47 PM

C o p y r i g h t © 2 0 1 2 , S A S In s t i t u t e In c . A l l r i g h t s r e s e r ve d .

KURZDEMO SAS ACCESS TO HADOOP ENGINE

SAS Application

Server

Access to

Hadoop

JAR

Files

Hadoop Cluster

HiveServer2

MapReduce(Compute Framework)

JDBC

Hive

Metastore

HDFS

XML

Files

Fileref HDFSData Files

Data Files

proc Hadoop -> passthrough hdfs

commands eingebettet in sas code

SAS access to Hadoop Zugriffe - > 3 Möglichkeiten

Sas access engine ->

Hive library

File access ->

hdfs

Proc hadoop ->

“pass through”

Page 8: Hadoop und SAS Status und Ausblick · Title: business-breakfast-hadoop-gernot-engel Author: Katharina Wismayr Subject: hadoop presentation Created Date: 6/17/2015 3:29:47 PM

C o p y r i g h t © 2 0 1 2 , S A S In s t i t u t e In c . A l l r i g h t s r e s e r ve d .

SAS DATA INTEGRATION

SERVER

GUI SUPPORTED HADOOP TRANSFORMATIONEN

EP

EPEP

FROM

Hadoop

WITH

Hadoop

IN

Hadoop

Page 9: Hadoop und SAS Status und Ausblick · Title: business-breakfast-hadoop-gernot-engel Author: Katharina Wismayr Subject: hadoop presentation Created Date: 6/17/2015 3:29:47 PM

Copyr i g ht © 2014, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

SAS DATA LOADER FOR

HADOOPSTECKBRIEF

• Führt SAS DS2 Code, HiveQL und DQ

Code auf einem Hadoop Cluster aus

• Kann Hadoop-Daten in einen

vorhandenen LASR Server laden (für

weitere Analysen in VA / VS)

• Zugriff auf externe Datenbanken (2.2)

Hadoop Cluster

SAS LASR (VA / VS)SAS Data Loader

RDBMS

Page 10: Hadoop und SAS Status und Ausblick · Title: business-breakfast-hadoop-gernot-engel Author: Katharina Wismayr Subject: hadoop presentation Created Date: 6/17/2015 3:29:47 PM

C o p y r i g h t © 2 0 1 2 , S A S In s t i t u t e In c . A l l r i g h t s r e s e r ve d .

KURZDEMO SAS DATALOADER FOR HADOOP 2.2

• Copy Data to

Hadoop

• Profile Data

• Identification

Analysis

• Query

ACQUIRE DATADISCOVER DATA

Access data, move

it into Hadoop, and

assess the data

structure and

content

1TRANSFORM DATA

• Query

• Select Columns

• Apply Filters

• Map Columns

• Sort / Order

• Calculate

Columns

• Transpose data

• Aggregate

• Transform data

Select data of

interest, manipulate

it, and structure it

into the data format

desired

2CLEANSE DATA

• Validate

• Parse

• Standardize

Put data into a

consistent format

3INTEGRATE DATA

• Join

• Create Match

codes

• Sort & De-

duplicate

• Aggregate

• Run a SAS

program

Combine datasets,

including data that

has no common key,

remove duplicate

data, and create new

data points thru

aggregation

4DELIVER DATA

• Load SAS LASR

• Create tables

• Create views

• Copy from Hadoop

Load datasets into

SAS LASR in-

memory analytic

server, Create new

Hadoop tables, and

deliver data to other

databases and apps

5Client PC

SAS Data Loader

vAppHadoop Sandbox

Configuration Directory

HDFS

Hadoop Data

Virt.Virt.

Hadoop

Config

EPSAS Embedded Process + QKB

CONFIG TRIAL EDITION

Hadoop

Config

EPSAS Embedded Process + QKB

Page 11: Hadoop und SAS Status und Ausblick · Title: business-breakfast-hadoop-gernot-engel Author: Katharina Wismayr Subject: hadoop presentation Created Date: 6/17/2015 3:29:47 PM

C o p y r i g h t © 2 0 1 2 , S A S In s t i t u t e In c . A l l r i g h t s r e s e r ve d .

SAS® DATA LOADER FOR

HADOOPEXAMPLE – PREPARE AND LOAD CUSTOMER DATA

I need my Customer

data in Hadoop

Use

Copy Data to Hadoop

Use

Cleanse Data in Hadoop

Business Analyst Action

I can see, but I can

also fix, the data

quality issue

I need to subset

and summarize the

data

Use

Transform data in Hadoop

Use

Load Data to LASR

Business Analyst Action

I need to Load the

data to LASR for

visualization

Page 12: Hadoop und SAS Status und Ausblick · Title: business-breakfast-hadoop-gernot-engel Author: Katharina Wismayr Subject: hadoop presentation Created Date: 6/17/2015 3:29:47 PM

C o p y r i g h t © 2 0 1 2 , S A S In s t i t u t e In c . A l l r i g h t s r e s e r ve d .

HADOOP PRAXIS ZUSAMMENFASSUNG

• Hadoop unterscheidet sich von traditionellen DBMS Systemen

• Umdenken bei der Datenverarbeitung notwendig

• Hive & SQL bedarfsgetrieben verwenden

• Datenhaltung eröffnet neue Möglichkeiten

• Ordner von Flatfiles werden als Tabelle verwaltet (vergleichbar SPD Server/SPD Engine)

• Arbeiten mit den Partitionen

• Nutzen der Transfer-Transformationen

• Hadoop ist optimiert auf große Tabellen

• Datenqualitätsfunktionen mit DQ Accelerator optimal für Big Data

• Dataloader for Hadoop – Fachbereichstaugliches point & click Werkzeug für

hadoop ( Datentransfer rdms – hdfs, LASR Server , DQ, ETL ..)

Page 13: Hadoop und SAS Status und Ausblick · Title: business-breakfast-hadoop-gernot-engel Author: Katharina Wismayr Subject: hadoop presentation Created Date: 6/17/2015 3:29:47 PM

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

SAS DATA LOADER

FOR HADOOPWHATS NEW – ROADMAP

• Version 2.3 (9.4M3)

• Enhancements

• Profile Threading & Performance Enhancements

• SAS User Defined Formats

• Hive 14 Enhancements

• Distribution Support

• MapR / PHD (stretch)

• New Directives: Hive Node, Delete Node

• LDAP Authentication

• Future (2.4+) *

• New Directives : Merge, Score

• Unstructured Text Processing

• Major Features

• Spark Integration

• Chained Directives – Execute in Jobs Parallel

• Federation Server Integration

• Automated & Smart Profiling

* features are subject to change

Page 14: Hadoop und SAS Status und Ausblick · Title: business-breakfast-hadoop-gernot-engel Author: Katharina Wismayr Subject: hadoop presentation Created Date: 6/17/2015 3:29:47 PM

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

SAS 9.4M3 WHAT‘S NEW

• MapR Support für alle SAS Bausteine

• PROC SQOOP

• SAS/Access to Hadoop

• Verbessert: Performance, Durchreichen von

Fehlerbeschreibungen, implicit pass-

through (where exists, between)

• SAS/ACCESS to HAWQ

• SAS/ACCESS to Impala

• BASE Proc Pushdown

• Embedded Process (Accelerators)

• Zugriff auf Daten über HCatalog (Hive

SerDes)

• Dateiformate verwendbar: Parquet, ORC,

Avro, Sequence, RCFILE

• Code Accelerator: erlaubt multiple

Input Data Sources, unterstützt Merge

Statement

Page 15: Hadoop und SAS Status und Ausblick · Title: business-breakfast-hadoop-gernot-engel Author: Katharina Wismayr Subject: hadoop presentation Created Date: 6/17/2015 3:29:47 PM

C o p y r i g h t © 2 0 1 2 , S A S In s t i t u t e In c . A l l r i g h t s r e s e r ve d .Copyr i gh t © 2012, SAS Ins t i tu te Inc . A l l r i gh ts reserved .

SAS & HADOOP IN-MEMORY TECHNOLOGIE – BI & ANALYTIC

• SAS High-Performance Analytics

• SAS Prozeduren aus den Bereichen Statistics,

Data Mining, Text Analytics, Optimization

übertragen auf verteilte In-Memory Technologie

• Frontend: Enterprise Miner

• Fokus auf Batch-Processing und Produktiv-Betrieb

• LASR-based In-Memory Technology

• SAS Visual Analytics / Visual Statistics

• Business Analysten und Data Scientists

• Fokus auf interaktive Analysen

• SAS In-Memory Statistics• Fokus auf Programmierung

Page 16: Hadoop und SAS Status und Ausblick · Title: business-breakfast-hadoop-gernot-engel Author: Katharina Wismayr Subject: hadoop presentation Created Date: 6/17/2015 3:29:47 PM

C o p y r i g h t © 2 0 1 2 , S A S In s t i t u t e In c . A l l r i g h t s r e s e r ve d .

HADOOP ANALYTICS FÜR SPEZIALTHEMEN

UND INPUT / ANREICHERUNG EDW

Operational

Data Sources

EDW

Data Mart

Data Mart

Analytic

MartAnalytic

Mart

Analytic

Mart

Data Mart

BI and

Analytics

SAS & HADOOP

ANWENDUNGS

SZENARIEN

Page 17: Hadoop und SAS Status und Ausblick · Title: business-breakfast-hadoop-gernot-engel Author: Katharina Wismayr Subject: hadoop presentation Created Date: 6/17/2015 3:29:47 PM

C o p y r i g h t © 2 0 1 2 , S A S In s t i t u t e In c . A l l r i g h t s r e s e r ve d .

HADOOP DATA PLATFORM ALS “STAGING LAYER”– “DATA LAKE”

Beladung HDFS, Auswertestrukturen in Hadoop, data appliances oder

RDBMS

Data Mart

Operational

Data Sources EDW

Data Mart

Analytic

MartAnalytic

Mart

BI and

Analytics

SAS & HADOOP

ANWENDUNGS

SZENARIEN

Page 18: Hadoop und SAS Status und Ausblick · Title: business-breakfast-hadoop-gernot-engel Author: Katharina Wismayr Subject: hadoop presentation Created Date: 6/17/2015 3:29:47 PM

C o p y r i g h t © 2 0 1 2 , S A S In s t i t u t e In c . A l l r i g h t s r e s e r ve d .

HADOOP IM EINSATZ

ERGEBNISSE EINER UMFRAGE UNTER SAS KUNDEN, DIE

BEREITS HADOOP EINSETZEN (EMEA/AP, 02.2015)

• Kunden nach Branchen

• Eingesetzte Hadoop

Distributionen

• Produkte im Einsatz

• Einsatzszenarien

Offload EDWH /

Cost Reduction

32%

"Data Lake"13%

"Analytics"42%

Fraud13%

Page 19: Hadoop und SAS Status und Ausblick · Title: business-breakfast-hadoop-gernot-engel Author: Katharina Wismayr Subject: hadoop presentation Created Date: 6/17/2015 3:29:47 PM

C o p y r i g h t © 2 0 1 2 , S A S In s t i t u t e In c . A l l r i g h t s r e s e r ve d .

SAS ANGEBOT BIG DATA LAB

TECHNOLOGIE SERVICEGrößenskalierung

S M L

Bereit-

stellung

On-

Premise

Cloud

Datenmanagement

► Data Loader for Hadoop

► Access to Hadoop

► Metadatenmanagement

Analytics

► Visual Analytics

► Visual Statistics

► In-Memory Statistics

Software-

Lösungen

► Installation

► Konfiguration

► Training

► Umsetzung eines beispielhaften

Use Cases

Zusätzlich buchbare Dienstleistungen:

► Coaching und Bereitstellung von

Experten (Data Scientist, Daten-

Management-Experte)

► Consulting

Einsatzfertiges

Komplettpaket für die

selbständige

Entwicklung von

Big Data Use Cases

zum Fixpreis

Page 20: Hadoop und SAS Status und Ausblick · Title: business-breakfast-hadoop-gernot-engel Author: Katharina Wismayr Subject: hadoop presentation Created Date: 6/17/2015 3:29:47 PM

Copyr i g ht © 2014, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

BIG DATA LAB IHRE VORTEILE

• Sie starten schneller.

• Sie minimieren das Risiko falscher

Investments.

• Sie sparen Doppelarbeit und

Doppelinvestitionen.

• Sie bezahlen genau das, was Sie

brauchen.

Page 21: Hadoop und SAS Status und Ausblick · Title: business-breakfast-hadoop-gernot-engel Author: Katharina Wismayr Subject: hadoop presentation Created Date: 6/17/2015 3:29:47 PM

C o p y r i g h t © 2 0 1 2 , S A S In s t i t u t e In c . A l l r i g h t s r e s e r ve d .

SUMMARY SAS & HADOOP VIELFÄLTIGE UNTERSTÜTZUNG, MORE TO COME!

1. Data Management: SAS optimiert

und erleichtert den Zugriff auf

Daten in Hadoop

2. In-Memory Analytics: SAS

erweitert und beschleunigt Analytik

auf Hadoop-Daten.

3. In-Database Processing: SAS

verlagert (analytische) SAS

Funktionalität in das Hadoop

Cluster.

Page 22: Hadoop und SAS Status und Ausblick · Title: business-breakfast-hadoop-gernot-engel Author: Katharina Wismayr Subject: hadoop presentation Created Date: 6/17/2015 3:29:47 PM

C o p y r i g h t © 2 0 1 2 , S A S In s t i t u t e In c . A l l r i g h t s r e s e r ve d .Copyr i gh t © 2012, SAS Ins t i tu te Inc . A l l r i gh ts reserved .

UNSER ANGEBOT THINK BIG, START NOW!

• BIG DATA LAB

• Auf www.sas.de/bigdatalab alle Infos zum Nachlesen

• PRODUKTE TESTEN

• SAS Data Loader for Hadoop kostenlos herunterladen und 90 Tage

testen: www.sas.de/dataloader

• SAS Visual Analytics (Demo) kostenlos ausprobieren:

www.sas.de/visualanalytics

Kommen Sie ins

Gespräch mit uns!

Page 23: Hadoop und SAS Status und Ausblick · Title: business-breakfast-hadoop-gernot-engel Author: Katharina Wismayr Subject: hadoop presentation Created Date: 6/17/2015 3:29:47 PM

Copyr i g ht © 2014, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

INFORMATIONEN -

KONTAKT

SAS UND HADOOP INFORMATIONEN:

http://www.sas.com/de_de/software/sas-hadoop.html - http://www.sas.com/en_us/software/sas-hadoop.html

Interessante White papers:

http://www.sas.com/en_us/whitepapers/big-data-analytics-hadoop-107049.html

http://www.sas.com/en_us/whitepapers/bringing-power-of-sas-to-hadoop-105776.html

BARC: Big data analytics in der DACH region:

http://www.sas.com/de_de/whitepapers/ba-wp-barc-big-data-analytics-2014-2298353.html

Webinare: http://www.sas.com/de_at/webinars.html

• Big Data Analytics mit SAS & Hadoop

• Big data lab

Code Beispiele:http://support.sas.com/resources/papers/proceedings14/SAS033-2014.pdf

Page 24: Hadoop und SAS Status und Ausblick · Title: business-breakfast-hadoop-gernot-engel Author: Katharina Wismayr Subject: hadoop presentation Created Date: 6/17/2015 3:29:47 PM

Copyr i g ht © 2014, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

THANKS A LOT !!

FRAGEN - next steps ??

[email protected], [email protected] ,

[email protected]

DANKE !