elastic map reduce

25
Big Data on Amazon EC2 Elastic Map Reduce AWS Meetup Ciudad de México Israel Gaytán CTO Vitatronix [email protected] @isragaytan

Upload: israel-gaytan

Post on 15-Aug-2015

256 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Elastic map reduce

Big Data on Amazon EC2Elastic Map Reduce

AWS Meetup Ciudad de México

Israel GaytánCTO Vitatronix

[email protected]@isragaytan

Page 2: Elastic map reduce

“Toto, creo que ya no estamos más en Kansas”

Dorothy

El Mago de Oz

Page 3: Elastic map reduce

BIG DATA

Page 4: Elastic map reduce

Infraestructura y servicios AWS

Page 5: Elastic map reduce

Servicios Big Data

Elastic Map Reduce: Procesamiento con lotes con Apache Hadoop

Kinesis: Procesamiento de flujo de datos para big data.

Data Pipeline: Podemos organizar flujos de trabajos complejos orientados a datos

Machine Learning: Aplicar algoritmos de aprendizaje automático para modelos y predicciones sobre sus datos

Page 6: Elastic map reduce

¿Qué es Map Reduce?

• Map reduce es el modelo de programación y el corazón de Hadoop

• Toma datos de manera masiva y los divide a través de un número de instancias EC2

• Tiene dos fases la fase Map y la fase Reduce• Se programa en Java y se envía de manera

distribuida al clúster

Page 7: Elastic map reduce

¿Qué es Map Reduce?

Page 8: Elastic map reduce

Flujo de Elastic Map Reduce

Page 9: Elastic map reduce

Flujo de Datos

Page 10: Elastic map reduce

Road

• Poner los datos en S3• Programar Map Reduce en Java y enviar el jar

(driver) a S3• Programar en Hive o Pig • Devolver los resultados a S3• Hacer visualizaciones de los datos

Page 11: Elastic map reduce

Log FilesApr 23 20:06:16 hostname.local ./generate-log.sh[22382]: INFO: Login successful for user AliceApr 23 20:06:16 hostname.local ./generate-log.sh[22382]: INFO: Login successful for user BobApr 23 20:06:16 hostname.local ./generate-log.sh[22382]: WARNING: Login failed for user MalloryApr 23 20:06:16 hostname.local ./generate-log.sh[22382]: SEVERE: Received SEGFAULT signal from process EveApr 23 20:06:16 hostname.local ./generate-log.sh[22382]: INFO: Logout occurred for user AliceApr 23 20:06:16 hostname.local ./generate-log.sh[22382]: INFO: User Walter accessed file /var/log/messagesApr 23 20:06:16 hostname.local ./generate-log.sh[22382]: INFO: Login successful for user ChuckApr 23 20:06:16 hostname.local ./generate-log.sh[22382]: INFO: Password updated for user CraigApr 23 20:06:16 hostname.local ./generate-log.sh[22382]: SEVERE: Disk write failureApr 23 20:06:16 hostname.local ./generate-log.sh[22382]: SEVERE: Unable to complete transaction - Out of memoryApr 23 20:06:17 hostname.local ./generate-log.sh[22382]: INFO: Login successful for user AliceApr 23 20:06:17 hostname.local ./generate-log.sh[22382]: INFO: Login successful for user BobApr 23 20:06:17 hostname.local ./generate-log.sh[22382]: WARNING: Login failed for user MalloryApr 23 20:06:17 hostname.local ./generate-log.sh[22382]: SEVERE: Received SEGFAULT signal from process EveApr 23 20:06:17 hostname.local ./generate-log.sh[22382]: INFO: Logout occurred for user AliceApr 23 20:06:17 hostname.local ./generate-log.sh[22382]: INFO: User Walter accessed file /var/log/messagesApr 23 20:06:17 hostname.local ./generate-log.sh[22382]: INFO: Login successful for user ChuckApr 23 20:06:17 hostname.local ./generate-log.sh[22382]: INFO: Password updated for user CraigApr 23 20:06:17 hostname.local ./generate-log.sh[22382]: SEVERE: Disk write failureApr 23 20:06:17 hostname.local ./generate-log.sh[22382]: SEVERE: Unable to complete transaction - Out of memoryApr 23 20:06:17 hostname.local ./generate-log.sh[22382]: INFO: Login successful for user AliceApr 23 20:06:17 hostname.local ./generate-log.sh[22382]: INFO: Login successful for user BobApr 23 20:06:17 hostname.local ./generate-log.sh[22382]: WARNING: Login failed for user Mallory

Page 12: Elastic map reduce

Mapper y Reducer

Page 13: Elastic map reduce

Driver

Page 14: Elastic map reduce

Road

Page 15: Elastic map reduce

Road

Page 16: Elastic map reduce

Road

Page 17: Elastic map reduce

Road

Page 18: Elastic map reduce

Road

Page 19: Elastic map reduce

Resultados

Apr 23 20:06:16 50Apr 23 20:06:17 159Apr 23 20:06:18 35Apr 23 20:06:17 160

Page 20: Elastic map reduce

Hive y Pig

• Hive es un lenguaje parecido a SQL• Pig es un lenguage de flujo especializado para

semi estructurado o no estructurad• Es más fácil Hive y Pig que Map Reduce Nativo

Page 21: Elastic map reduce

Hive y Pig on EMR

Page 22: Elastic map reduce

¿ EMR ?

• Levantar clúster• Elegir AMI o sistema operativo• Elegir distribuciones de Hadoop como

Cloudera, HortonWorks, Mapr• Elegir bien infraestructura • ¡Placement groups para baja latencia!• Cuidar los IOPS

Page 23: Elastic map reduce

Pequeña comparativa herramientas

• EMR = Hadoop cualquier distribución• Data Pipeline = Oozie• Kinesis = Flume, Kafka, Scribe + Storm• Lambda = Kafka + Storm/Spark + NoSQL/SQL• Amazon EMR= Spark MLib

Page 24: Elastic map reduce

Preguntas y Respuestas

Page 25: Elastic map reduce

GRACIAS