big data en idc-madrid

Qué tecnologías

están sustentando

el Big Data

¿Cuáles son sus

retos?

Jordi Torres

UPC/BSC

Madrid - 18/09/2012

Marea de información

0,8 Zettabytes

(*) 1 Zettabyte (ZB) = 1.000.000.000.000 GB

1 Gigabyte (GB) = 1.000.000.000 bytes

0,8 Zettabytes

2020: 35,2 Zettabytes

(*) 1 Zettabyte (ZB) = 1.000.000.000.000 GB

1 Gigabyte (GB) = 1.000.000.000 bytes

0,8 Zettabytes

2020: 35,2 Zettabytes

3 V y …

¡ LA PREGUNTA !

¿Vamos a poder

con todo ello?

¡ LA PREGUNTA !

¿Vamos a poder

con todo ello?

¡ LA PREGUNTA ! ¿Qué se está ya

ofreciendo e

investigando?

Volumen de datos GBs PBs

¿Vamos a poder

con todo ello?

GBs PBs

¿Vamos a poder

con todo ello?

!Se ha desbordado la capacidad de las

tecnologías actuales!

Volumen de datos

Almacenamiento

Gestión

Procesado

Análisis

Qué tecnologías

están sustentando

el Big Data

¿cuáles son sus

retos?

Almacenamiento

HHD 100 más barato que RAM

Pero 1000 veces más lento

¡Más y más rápido!

Almacenamiento

Solid- state drive (SSD)

además no volátil

Propuesta actual

Almacenamiento

Solid- state drive (SSD)

además no volátil

Storage Class Memory (SCM)

Propuesta actual

Investigación

Gestión

Atomicity,

Consistency,

Isolation &

Durability

p.ej. “esquemas” o las

propiedades ACID

¡Las BD relacionales

no pueden con todo!

Gestión Propuesta actual

“NO SQL systems”

Self-* NoSQL systems

Nuevas propiedades BASE: Basically Available, Soft state,

Eventual consistency

Investigación

Ej. Facebook

¡Las BD relacionales

no pueden con todo!

Procesado

entornos masivamente

paralelos+distribuidos

y tolerante a fallos

Hacen falta nuevos

modelos programación

Procesado Propuesta actual

“LA” soluciones open source

y propietarias

GBs PBs

Hacen falta nuevos

modelos programación

Difícil pensar en

MapReduce

Hace falta

“DESAPRENDER”

Difícil pensar en

MapReduce

Solución(open source)

Hace falta

“DESAPRENDER”

Niveles de

abstracción

Hbase/Cassandra

(No-SQL system)

(SQL based language)

(Data Flow Language)

Solución (industria)

Por ejemplo SQL+NoSQL

SQL+NoSQL:

p.ej. integrar funcionalidades MapReduce

Conectores MapReduce para DW

Normalized data

DataWarehouse

Business Users Business Analysts Etc.

Hadoop ODBC driver

Gestión integrada de:

• la jerarquía de almacenamiento,

• transparente al usuario

• autogestionada para ser

optimizada

• …

Investigación

IN-MEMORY

APPLICATION

Escenario que tendremos:

Análisis

probablemente

¡EL RETO MÁS

IMPORTANTE! para ustedes

Análisis

conocimiento

información

Propuesta actual

Investigación

Análisis

Data Mining,

Machine Learning ,

La mayoría de algoritmos se

ejecutan bien en miles de

registros, pero son hoy por

hoy impracticables en miles de

millones. ¡En ello estamos!

conocimiento

información

Investigación

Análisis

La mayoría de algoritmos se

ejecutan bien en miles de

registros, pero son hoy por

hoy impracticables en miles de

millones. ¡En ello estamos!

¿Reflejo de los avances

actuales?

O cada uno en

su casa …

Source: http://www.smartplanet.com/blog/business-brains/retailer-or-a-data-company-wal-mart-is-now-both/20850

“Oddly, machine learning research mirrors the way cryptography research developed around the middle of the 20th century. Much of the cutting edge research was done in secret, and we’re only finding out now, 40 or 50 years later, what GCHQ or the NSA was doing back then. I’m hopeful that it won’t take quite that long for Amazon or Google to tell us what they’re thinking about today.” (pag 49) Alasdair Allan, senior research fellow in Astronomy at the University of Exeter

Categorization (un-supervised) :

.K-means clustering

. Association Rules

. … Regression

. Linear

. Logistic Classification (supervised)

. Naïve Bayesian classifier

. Decision Trees

. Time Series Analysis

. Text Analysis

¿Es fundamental

para su negocio

dominar

internamente en

sus empresas las

tecnologías de

Machine Learning?

¿O quizás no?

¿El método realmente

importa en Big Data?

Ej:Text processing

Ej. Clásico: Para Banko and

Brill (2001) ¿Parece que los

datos son más importantes

que los métodos?

En definitiva …

“machine learning

algorithms really

don’t matter, all

that matters is the

amount of data

you have”

¿Qué piensan

ustedes?

necesarias

estas skills?

Mi visión:

(no estamos en el nivel de

maduración del Cloud)

Data Analysis & Prediction

Big Data

Cloud Computing

Smart Computing

… DE QUE VOY A DAR CLASES YO EN BARCELONA!!!!!!!!!

Editorial UOC, Octubre 2012

Creative Commons 3.0

Profesor e investigador en nuevas tecnologías TIC

Actúa como experto para diferentes organizaciones públicas Consultor tecnológico, miembro de consejo de administración

Imparte conferencias y colabora con diferentes medios de comunicación

www.JordiTorres.eu

@JordiTorresBCN

Más información del tema:

Más información del autor:

¡Gracias por

su atención!

www.bsc.es/eBusiness

Pero … especialmente a:

big data en idc-madrid

Sports

big data analytics: future architectures, skills and...

cloud clf 2011 12 big things to know idc analysts 2011

big data adoption drivers sources: us data: idc 2012...

idc report-worldwide big data technology and services

big data, big problems: avoid system failure with quality...

idc big data infographic · gps data customer analytics and...

how to create a personal knowledge graph ibm meetup big data...

madrid big poster march 2013 copy copy copy

big data solution for ctbt monitoring:cea-idc joint global...

scaling up data management: from data to big...

idc big data trends strategies sap

may 29 - 31, 2019 madrid, spain feria de madrid · • tech...

enterprise big data & analytics conference - idc · pdf...

clÚster big data madrid plan estratÉgico incial de …

harnessing the power of iot, big data and analytics · to...

high performance tensorflow in production - big data spain -...

idc whitepaper: the business value of cisco ucs for big...

idc i v i e w the digital universe in 2020: big data ......

opentext presents: mastering the digital economy through big...

máster en business intelligence y big data, g2, madrid