sas data loader for hadoop · points thru aggregation 4 deliver data •load sas lasr •create...

33
Copyright © 2012, SAS Institute Inc. All rights reserved. SAS DATA LOADER FOR HADOOP Sergio Uassouf Líder de Práctica de Gestión de Información e Infraestructura SAS FORUM ARGENTINA 2015 5 DE MAYO

Upload: others

Post on 23-Sep-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SAS DATA LOADER FOR HADOOP · points thru aggregation 4 DELIVER DATA •Load SAS LASR •Create tables •Create views •Copy from Hadoop Load datasets into SAS LASR in-memory analytic

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

SAS DATA LOADER FOR HADOOP

Sergio UassoufLíder de Práctica de

Gestión de Información e Infraestructura

SAS FORUM ARGENTINA 2015 5 DE MAYO

Page 2: SAS DATA LOADER FOR HADOOP · points thru aggregation 4 DELIVER DATA •Load SAS LASR •Create tables •Create views •Copy from Hadoop Load datasets into SAS LASR in-memory analytic

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

SAS DATA LOADER

FOR HADOOPAGENDA

BIG DATA Y HADOOP

SAS DATA LOADER FOR HADOOP

DEMOSTRACION

COMENTARIOS FINALES

Page 3: SAS DATA LOADER FOR HADOOP · points thru aggregation 4 DELIVER DATA •Load SAS LASR •Create tables •Create views •Copy from Hadoop Load datasets into SAS LASR in-memory analytic

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

BIG DATA Y HADOOP

Page 4: SAS DATA LOADER FOR HADOOP · points thru aggregation 4 DELIVER DATA •Load SAS LASR •Create tables •Create views •Copy from Hadoop Load datasets into SAS LASR in-memory analytic

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

BIG DATA

SAS ON HADOOP

MPP

Si puede almacenar mucha más información a un costo

mucho menor...

Y puede procesarla en un tiempo mucho menor.

Entonces no necesita armar modelos tomando sólo un

subconjunto de los datos...

Y puede hacer todas las iteraciones que necesite.

Entonces puede almacenar y procesar la

información que antes no podía

ALMACENAR Y ANALIZAR GRANDES VOLUMENES DE

INFORMACION A BAJO COSTO

Page 5: SAS DATA LOADER FOR HADOOP · points thru aggregation 4 DELIVER DATA •Load SAS LASR •Create tables •Create views •Copy from Hadoop Load datasets into SAS LASR in-memory analytic

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

NECESIDAD A

RESOLVER

ALMACENAR Y ANALIZAR GRANDES VOLUMENES DE

INFORMACION A BAJO COSTO

TODOS LOS

CALL DETAIL

RECORDS

TODAS LAS

TRANSACCIONESTODAS LAS

SECUENCIAS DE

SITIOS WEB

TODAS LAS

CONVERSACIONES

DE LOS CALL

CENTERS

Y ANALIZARLOS

EN SU TOTALIDAD...

EJECUTANDO

TODAS LAS

ITERACIONES QUE

NECESITE...

A MUY BAJO

COSTO RELATIVO

Page 7: SAS DATA LOADER FOR HADOOP · points thru aggregation 4 DELIVER DATA •Load SAS LASR •Create tables •Create views •Copy from Hadoop Load datasets into SAS LASR in-memory analytic

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

Equipo de Trabajo Datos y arquitectura de TI

bien administrada brindan

mayor agilidad en todo el

ciclo analítico

Page 8: SAS DATA LOADER FOR HADOOP · points thru aggregation 4 DELIVER DATA •Load SAS LASR •Create tables •Create views •Copy from Hadoop Load datasets into SAS LASR in-memory analytic

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

BIG DATA VACUNA CONTRA EL HUMO

Explíqueme como lo hace.

Muéstreme como funciona.

Page 10: SAS DATA LOADER FOR HADOOP · points thru aggregation 4 DELIVER DATA •Load SAS LASR •Create tables •Create views •Copy from Hadoop Load datasets into SAS LASR in-memory analytic

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

Desde los inicios de la informática un computador, ya sea personal o

empresarial está compuesto de 3 componentes principales.

Disk

RAM

CPUCPU

Disk

RAM

CPU CPU

Disk

RAM

CPU CPU

Disk

RAM

CPU CPU

Disk

RAM

CPU CPU

Disk

RAM

CPU CPU

PONIENDONOS

EN CONTEXTOCOMPONENTES BÁSICOS EN LA ACTUALIDAD

Pero ahora... en ¡¡¡ Procesamiento Masivamente Paralelo !!!

MEMORIA

UNIDADES DE

ALMACENAMIENTO

UNIDADES DE

PROCESAMIENTO

Disk

RAM

CPU CPU

Disk

RAM

CPU CPU

Page 11: SAS DATA LOADER FOR HADOOP · points thru aggregation 4 DELIVER DATA •Load SAS LASR •Create tables •Create views •Copy from Hadoop Load datasets into SAS LASR in-memory analytic

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

HADOOP HADOOP 2.0 – MODELOS DE PROGRAMACION

Gráfico tomadode Hortonworks

Page 12: SAS DATA LOADER FOR HADOOP · points thru aggregation 4 DELIVER DATA •Load SAS LASR •Create tables •Create views •Copy from Hadoop Load datasets into SAS LASR in-memory analytic

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

HADOOP HADOOP 2.0 – MODELOS DE PROGRAMACION

Gráfico tomadode Hortonworks

SAS LASR SERVER SAS EMBEDDED PROCESS(SAS LENGUAJE DS2)

Page 13: SAS DATA LOADER FOR HADOOP · points thru aggregation 4 DELIVER DATA •Load SAS LASR •Create tables •Create views •Copy from Hadoop Load datasets into SAS LASR in-memory analytic

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

LASR SERVERMODULOS SAS

HIGH PERFORMANCE

IMSTAT for Hadoop*

SOLUCIONES

ANALÍTICAS

PROCESAMIENTO

EN PARALELO

EN MEMORIA

DIFERENCIADORES DE LOS

PRODUCTOS SAS:

PODER DE ANÁLISIS

INTERCATIVIDAD / CONCURRENCIA

DE MÚLTIPLES USUARIOS

FLEXIBILIDAD / FACILIDAD DE USO

Interactividad / Concurrencia de

Múltiples Usuarios

Po

der

de

An

áli

sis

Batch Interactivo

High Perf. Data Mining

High Perf. Statistics

Visual Analytics

*SAS® In-Memory Statistics for Hadoop

Visual Statistics

Page 14: SAS DATA LOADER FOR HADOOP · points thru aggregation 4 DELIVER DATA •Load SAS LASR •Create tables •Create views •Copy from Hadoop Load datasets into SAS LASR in-memory analytic

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

SAS DATA LOADER FOR HADOOP

Page 15: SAS DATA LOADER FOR HADOOP · points thru aggregation 4 DELIVER DATA •Load SAS LASR •Create tables •Create views •Copy from Hadoop Load datasets into SAS LASR in-memory analytic

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

Source: Gartner (Sep 2014), Big Data Investment Grows but Deployments Remain Scarce in 2014 By Nick Heudecker, Lisa Kart

BIG DATA DESAFÍOS / OBSTÁCULOS

Page 16: SAS DATA LOADER FOR HADOOP · points thru aggregation 4 DELIVER DATA •Load SAS LASR •Create tables •Create views •Copy from Hadoop Load datasets into SAS LASR in-memory analytic

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

HADOOPDIVERSIDAD Y COMPLEJIDAD DE CONOCIMIENTOS

PRODUCTIVIDAD

Performing even the simplest tasks in

Hadoop typically requires mastering

disparate tools and writing hundreds of

lines of code.

Fact: There are a limited # of users

with the necessary Hadoop skills

• MapReduce

• Pig Latin

• HiveQL

• HDFS

• Sqoop and Oozie

Page 17: SAS DATA LOADER FOR HADOOP · points thru aggregation 4 DELIVER DATA •Load SAS LASR •Create tables •Create views •Copy from Hadoop Load datasets into SAS LASR in-memory analytic

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

HADOOPLAS HERRAMIENTAS NO ESTÁN PREPARADAS

PARA BIG DATA

Big data brings new requirements:

• Access to HDFS

• Parallel Loads

• New native file types

• Knowledge of file structures

• New languages & code

• Need to transform data In-cluster

User tools are not engineered to process

data inside Hadoop.

• Tools are not optimized for Hadoop

• Users move data out of Hadoop to do

data management and data quality

• This requires more processing time

• Data is duplicated and more storage is

required

• Users do not use the Hadoop platform

as it was designedHACER COMENTARIO

Page 18: SAS DATA LOADER FOR HADOOP · points thru aggregation 4 DELIVER DATA •Load SAS LASR •Create tables •Create views •Copy from Hadoop Load datasets into SAS LASR in-memory analytic

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

BIG DATA

MANAGEMENTRECOMENDACIÓN DE ANALISTA

Recommendation

“Use self-service interactive data preparation tools to enhance analyst productivity.” and

“improve the quality of data”

– Gartner, “Data Preparation Is Not an Afterthought”

Page 19: SAS DATA LOADER FOR HADOOP · points thru aggregation 4 DELIVER DATA •Load SAS LASR •Create tables •Create views •Copy from Hadoop Load datasets into SAS LASR in-memory analytic

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

AUTO-SERVICIO

EN PREPARACION DE DATOS

Given that data preparation is 70-80% of the work involved in any analytic project, existing customers struggling with managing a Hadoop environment are the low-

hanging fruit.

The rise of self-service data-preparation tools … is putting data management directly into the hands of analysts

SAS Data Loader for Hadoop showcases the

company's solid engineering talent and

reputation for building high-quality software

BIG DATA

MANAGEMENT

Page 20: SAS DATA LOADER FOR HADOOP · points thru aggregation 4 DELIVER DATA •Load SAS LASR •Create tables •Create views •Copy from Hadoop Load datasets into SAS LASR in-memory analytic

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

SAS EN HADOOP

SAS Data Loader for Hadoop Visual Analytics / Statistics

PARA ANALIZAR LOS DATOS EN PARALELO...

TENEMOS QUE PODER PREPARAR LOS DATOS EN PARALELO

Page 21: SAS DATA LOADER FOR HADOOP · points thru aggregation 4 DELIVER DATA •Load SAS LASR •Create tables •Create views •Copy from Hadoop Load datasets into SAS LASR in-memory analytic

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

SAS DATA LOADER FOR HADOOP - ¿QUE HACE?

Solución SAS para Administración de Datos

cuyos objetivos son transferir desde y hacia,

estructurar, limpiar / depurar y transformar datos

dentro de Hadoop.

SAS Data Loader for Hadoop hace productivo el

entorno Hadoop, eliminando sus barreras de

complejidad, haciendo a los datos accesibles y

fácilmente utilizables.

Page 22: SAS DATA LOADER FOR HADOOP · points thru aggregation 4 DELIVER DATA •Load SAS LASR •Create tables •Create views •Copy from Hadoop Load datasets into SAS LASR in-memory analytic

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

• Copy Data to Hadoop

• Profile Data

• Identification Analysis

• Query

ACQUIRE DATADISCOVER DATA

Access data, move it

into Hadoop, and

assess the data

structure and content

1TRANSFORM DATA

• Query

• Select Columns

• Apply Filters

• Map Columns

• Sort / Order

• Calculate Columns

• Transpose data

• Aggregate

• Transform data

Select data of interest,

manipulate it, and

structure it into the data

format desired

2 CLEANSE DATA

• Validate

• Parse

• Standardize

Put data into a

consistent format

3 INTEGRATE DATA

• Join

• Create Match codes

• Sort & De-duplicate

• Aggregate

• Run a SAS program

Combine datasets,

including data that has

no common key,

remove duplicate data,

and create new data

points thru aggregation

4 DELIVER DATA

• Load SAS LASR

• Create tables

• Create views

• Copy from Hadoop

Load datasets into SAS

LASR in-memory

analytic server, Create

new Hadoop tables, and

deliver data to other

databases and apps

5

SAS DATA LOADER FOR HADOOP - RESUMEN DE CAPACIDADES

Page 23: SAS DATA LOADER FOR HADOOP · points thru aggregation 4 DELIVER DATA •Load SAS LASR •Create tables •Create views •Copy from Hadoop Load datasets into SAS LASR in-memory analytic

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

SAS DATA LOADER FOR HADOOP

DEMOSTRACION

Page 24: SAS DATA LOADER FOR HADOOP · points thru aggregation 4 DELIVER DATA •Load SAS LASR •Create tables •Create views •Copy from Hadoop Load datasets into SAS LASR in-memory analytic

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

SAS DATA LOADER FOR HADOOP

COMENTARIOS FINALES

Page 25: SAS DATA LOADER FOR HADOOP · points thru aggregation 4 DELIVER DATA •Load SAS LASR •Create tables •Create views •Copy from Hadoop Load datasets into SAS LASR in-memory analytic

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

USUARIOS ANALISTA DE NEGOCIOS

• Self service access to data

• Query and manipulate data

• Copy data to/from Hadoop

• Load data into SAS LASR

Activities:

Page 26: SAS DATA LOADER FOR HADOOP · points thru aggregation 4 DELIVER DATA •Load SAS LASR •Create tables •Create views •Copy from Hadoop Load datasets into SAS LASR in-memory analytic

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

USUARIOSCIENTÍFICOS DE DATOS (DATA SCIENTISTS)

ESTADÍSTICOS

Analytics ready dataset

Event data

Customer data

Log files

Data Preparation

• Create an analytics ready dataset

• Transform and manipulate data

• Optional: Write SAS DS2 code

• Load data into SAS LASR server

Activities:

Page 27: SAS DATA LOADER FOR HADOOP · points thru aggregation 4 DELIVER DATA •Load SAS LASR •Create tables •Create views •Copy from Hadoop Load datasets into SAS LASR in-memory analytic

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

USAGE SCENARIOS BANKING SCENARIO….

Access data, move it into Hadoop,

and assess the data structure and

content

Teradata

Oracle

• Moves Tier 1 customer data

form staging areas to do

Hadoop

• Query data in Hadoop to

ensure only High Value

customers are selected

• Fix the data with cleansing

capabilities

Select data of interest,

manipulate it, and structure it

into the data format desired

Put data into a consistent

formatCombine datasets, remove

duplicate data,

Load dataset into SAS LASRUser

• Senior management

analysed the

portfolio value of

high net worth

customers over a

period of time

Page 28: SAS DATA LOADER FOR HADOOP · points thru aggregation 4 DELIVER DATA •Load SAS LASR •Create tables •Create views •Copy from Hadoop Load datasets into SAS LASR in-memory analytic

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

SAS DATA LOADER

FOR HADOOPDIFERENCIADORES

Interfaz de usuario "Point-and-click" diseñada para el auto-servicio en preparación de datos.

Preparación de datos en Hadoop de modo similar al que SAS proveé para otras fuentes de datos.

Consistencia y Reutilización: Aplicación de estándares de calidad a los datos almacenados en Hadoop.

Conjunto integral de herramientas para el ciclo de vida analítico de punta a punta.

Diseñada específicamente para los objetivos de Hadoop. Operación muy simple.

Permite el movimiento de datos, su procesamiento y la ejecución de tareas de calidad de datos en paralelo sin necesidad de escritura de código.

Carga de datos al SAS LASR Analytic Server

Big Compute: Mueve el procesamiento al lugar donde los datos residen.

Page 29: SAS DATA LOADER FOR HADOOP · points thru aggregation 4 DELIVER DATA •Load SAS LASR •Create tables •Create views •Copy from Hadoop Load datasets into SAS LASR in-memory analytic

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

SAS DATA LOADER

FOR HADOOPPARA MAYOR INFORMACION

Download the Free Trial of SAS Data Loader for Hadoop at:

http://sas.com/dataloader

Learn more about SAS and Hadoop:

http://sas.com/hadoop

Big Data Matters Webinar Series:

www.SAS.com/bigdatamatters

Follow us on Twitter: @sasdatamgmt

Like us on Facebook: SAS Software

DOWNLOAD THE

FREE TRIAL!

Page 30: SAS DATA LOADER FOR HADOOP · points thru aggregation 4 DELIVER DATA •Load SAS LASR •Create tables •Create views •Copy from Hadoop Load datasets into SAS LASR in-memory analytic

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d . www.SAS.com

PRUEBE LA DEMO GRATIS EN SU INSTALACION

ESTAMOS A SU DISPOSICIÓN PARA AYUDARLO

LLAMENOS PARA UNA CONVERSACIÓN SOBRE EL

ARMADO DE SU ENTORNO HADOOP

Page 31: SAS DATA LOADER FOR HADOOP · points thru aggregation 4 DELIVER DATA •Load SAS LASR •Create tables •Create views •Copy from Hadoop Load datasets into SAS LASR in-memory analytic

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

HADOOP: FORMATO

DE ARCHIVOSSEQUENCE

Page 32: SAS DATA LOADER FOR HADOOP · points thru aggregation 4 DELIVER DATA •Load SAS LASR •Create tables •Create views •Copy from Hadoop Load datasets into SAS LASR in-memory analytic

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

HADOOP: FORMATO

DE ARCHIVOSOPTIMIZED ROW COLUMNAR

Page 33: SAS DATA LOADER FOR HADOOP · points thru aggregation 4 DELIVER DATA •Load SAS LASR •Create tables •Create views •Copy from Hadoop Load datasets into SAS LASR in-memory analytic

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

HADOOP: FORMATO

DE ARCHIVOSPARQUET

Grupos de filas en

formato columnar

Footer contiene el

offset de cada grupo y

su esquema.

Independiente del

lenguaje.