apache big data conference · 2017-12-14 · apache big data conference how to transform data into...

34
APACHE BIG DATA CONFERENCE How to transform data into money using Big Data technologies

Upload: others

Post on 20-May-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: APACHE BIG DATA CONFERENCE · 2017-12-14 · APACHE BIG DATA CONFERENCE How to transform data into money using Big Data technologies. ... 3 DISTRIBUTED PROCESSING SOLUTION APACHE

APACHEBIG DATACONFERENCE

How to transform data into moneyusing Big Data technologies

Page 2: APACHE BIG DATA CONFERENCE · 2017-12-14 · APACHE BIG DATA CONFERENCE How to transform data into money using Big Data technologies. ... 3 DISTRIBUTED PROCESSING SOLUTION APACHE

After almost a decade developing Big Data projects in Paradigma, through its R+D department

we were early adopters of Spark, which led to the creation of Stratio

THE FIRST SPARK-BASED BIG DATA PLATFORM RELEASED

INTRO

Page 3: APACHE BIG DATA CONFERENCE · 2017-12-14 · APACHE BIG DATA CONFERENCE How to transform data into money using Big Data technologies. ... 3 DISTRIBUTED PROCESSING SOLUTION APACHE

JORGE LOPEZ-MALLA

After working with traditional

processing methods, I started to

do some R&S Big Data projects

and I fell in love with the Big Data

world. Currently i’m doing some

awesome Big Data projects at

Stratio

MY PROFILE

SKILLS

Page 4: APACHE BIG DATA CONFERENCE · 2017-12-14 · APACHE BIG DATA CONFERENCE How to transform data into money using Big Data technologies. ... 3 DISTRIBUTED PROCESSING SOLUTION APACHE

ALBERTO RODRÍGUEZ DE LEMA

After graduating I've been

programming for more than 10 years.

I’ve built high performance and

scalable web applications for

companies such as Indra Systems,

Prudential and Springer Verlag Ltd.

MY PROFILE

@ardlema

SKILLS

Page 5: APACHE BIG DATA CONFERENCE · 2017-12-14 · APACHE BIG DATA CONFERENCE How to transform data into money using Big Data technologies. ... 3 DISTRIBUTED PROCESSING SOLUTION APACHE

II

GO TO SPACESTRATIO

OPEN-SOURCE SOLUTIONSOur enterprises solutions are based on open sourcetechnologies

PURE SPARKThe only pure Spark platform,

the only global solution

ENTERPRISE SPARKOn – premise & cloud, our platform is

geared towards helping companies

SPARK-BASED BD PLATFORMThe first Spark-Based big data platform released

Page 6: APACHE BIG DATA CONFERENCE · 2017-12-14 · APACHE BIG DATA CONFERENCE How to transform data into money using Big Data technologies. ... 3 DISTRIBUTED PROCESSING SOLUTION APACHE

OUR CLIENT

MIDDLE EAST TELCO COMPANY

o 9.500 mil. daily events processed

o 9.2 mil. clients

Page 7: APACHE BIG DATA CONFERENCE · 2017-12-14 · APACHE BIG DATA CONFERENCE How to transform data into money using Big Data technologies. ... 3 DISTRIBUTED PROCESSING SOLUTION APACHE

USE CASES

Page 8: APACHE BIG DATA CONFERENCE · 2017-12-14 · APACHE BIG DATA CONFERENCE How to transform data into money using Big Data technologies. ... 3 DISTRIBUTED PROCESSING SOLUTION APACHE

MANAGEMENT & NORMALIZATION OF DATA SOURCES

USE CASES

1

Page 9: APACHE BIG DATA CONFERENCE · 2017-12-14 · APACHE BIG DATA CONFERENCE How to transform data into money using Big Data technologies. ... 3 DISTRIBUTED PROCESSING SOLUTION APACHE

USE CASES

MANAGEMENT & NORMALIZATION OF DATA SOURCES

1

Page 10: APACHE BIG DATA CONFERENCE · 2017-12-14 · APACHE BIG DATA CONFERENCE How to transform data into money using Big Data technologies. ... 3 DISTRIBUTED PROCESSING SOLUTION APACHE

USE CASES

NETWORK COVERAGE IMPROVEMENT

2

Page 11: APACHE BIG DATA CONFERENCE · 2017-12-14 · APACHE BIG DATA CONFERENCE How to transform data into money using Big Data technologies. ... 3 DISTRIBUTED PROCESSING SOLUTION APACHE

USE CASES

PEOPLE GATHERING

3

Page 12: APACHE BIG DATA CONFERENCE · 2017-12-14 · APACHE BIG DATA CONFERENCE How to transform data into money using Big Data technologies. ... 3 DISTRIBUTED PROCESSING SOLUTION APACHE

USE CASES

PEOPLE GATHERING

3

Page 13: APACHE BIG DATA CONFERENCE · 2017-12-14 · APACHE BIG DATA CONFERENCE How to transform data into money using Big Data technologies. ... 3 DISTRIBUTED PROCESSING SOLUTION APACHE

USE CASES

DATA MONETIZATION

4

Page 14: APACHE BIG DATA CONFERENCE · 2017-12-14 · APACHE BIG DATA CONFERENCE How to transform data into money using Big Data technologies. ... 3 DISTRIBUTED PROCESSING SOLUTION APACHE

USE CASES

DATA MONETIZATION

4

Page 15: APACHE BIG DATA CONFERENCE · 2017-12-14 · APACHE BIG DATA CONFERENCE How to transform data into money using Big Data technologies. ... 3 DISTRIBUTED PROCESSING SOLUTION APACHE
Page 16: APACHE BIG DATA CONFERENCE · 2017-12-14 · APACHE BIG DATA CONFERENCE How to transform data into money using Big Data technologies. ... 3 DISTRIBUTED PROCESSING SOLUTION APACHE
Page 17: APACHE BIG DATA CONFERENCE · 2017-12-14 · APACHE BIG DATA CONFERENCE How to transform data into money using Big Data technologies. ... 3 DISTRIBUTED PROCESSING SOLUTION APACHE
Page 18: APACHE BIG DATA CONFERENCE · 2017-12-14 · APACHE BIG DATA CONFERENCE How to transform data into money using Big Data technologies. ... 3 DISTRIBUTED PROCESSING SOLUTION APACHE

DATA MONETIZATION

4

USE CASES

Page 19: APACHE BIG DATA CONFERENCE · 2017-12-14 · APACHE BIG DATA CONFERENCE How to transform data into money using Big Data technologies. ... 3 DISTRIBUTED PROCESSING SOLUTION APACHE

TECHNICAL CHALLENGES

Page 20: APACHE BIG DATA CONFERENCE · 2017-12-14 · APACHE BIG DATA CONFERENCE How to transform data into money using Big Data technologies. ... 3 DISTRIBUTED PROCESSING SOLUTION APACHE

TECHNICAL PROBLEMS

Huge volumenof data

Huge sizeof Data

Distributedprocessing

Hardto read

Recognized patterns

1 2 3 4 5

Page 21: APACHE BIG DATA CONFERENCE · 2017-12-14 · APACHE BIG DATA CONFERENCE How to transform data into money using Big Data technologies. ... 3 DISTRIBUTED PROCESSING SOLUTION APACHE

1 HUGE VOLUME OF DATA

SOLUTIONAPACHE HADOOP DISTRIBUTED FILE SYSTEM (HDFS)

Page 22: APACHE BIG DATA CONFERENCE · 2017-12-14 · APACHE BIG DATA CONFERENCE How to transform data into money using Big Data technologies. ... 3 DISTRIBUTED PROCESSING SOLUTION APACHE

1 HUGE VOLUME OF DATA

9500 mil. csv daily records -> circa 16 Gb

Requirements:

High availability

Concurrent file reads

Page 23: APACHE BIG DATA CONFERENCE · 2017-12-14 · APACHE BIG DATA CONFERENCE How to transform data into money using Big Data technologies. ... 3 DISTRIBUTED PROCESSING SOLUTION APACHE

2 HUGE SIZE OF DATA

SOLUTIONAPACHE PARQUET

Page 24: APACHE BIG DATA CONFERENCE · 2017-12-14 · APACHE BIG DATA CONFERENCE How to transform data into money using Big Data technologies. ... 3 DISTRIBUTED PROCESSING SOLUTION APACHE

2 HUGE SIZE OF DATA

16.5 Gb of daily event information stored as csv text in HDFS

4.3 Gb of daily event information stored as parquet files in HDFS

STORE IMPROVEMENT Circa 70%

Page 25: APACHE BIG DATA CONFERENCE · 2017-12-14 · APACHE BIG DATA CONFERENCE How to transform data into money using Big Data technologies. ... 3 DISTRIBUTED PROCESSING SOLUTION APACHE

2 HUGE SIZE OF DATA

Time to count daily csv events -> 6.2 minutes

.

Time to count daily Parquet events -> 1 minute

READ PROCESS IMPROVEMENT Circa 80%

Page 26: APACHE BIG DATA CONFERENCE · 2017-12-14 · APACHE BIG DATA CONFERENCE How to transform data into money using Big Data technologies. ... 3 DISTRIBUTED PROCESSING SOLUTION APACHE

3 DISTRIBUTED PROCESSING

SOLUTIONAPACHE SPARK

Page 27: APACHE BIG DATA CONFERENCE · 2017-12-14 · APACHE BIG DATA CONFERENCE How to transform data into money using Big Data technologies. ... 3 DISTRIBUTED PROCESSING SOLUTION APACHE

3 DISTRIBUTED PROCESSING - REQUIREMENTS

Complex algorithmics with the minimum amount of resources

Reduction of the process time in order to obtain data when itstill is used

Page 28: APACHE BIG DATA CONFERENCE · 2017-12-14 · APACHE BIG DATA CONFERENCE How to transform data into money using Big Data technologies. ... 3 DISTRIBUTED PROCESSING SOLUTION APACHE

3 DISTRIBUTED PROCESSING - REQUIREMENTS

Sharing the cluster with legacy processes

Use of legacy outputs processes without does any change

Page 29: APACHE BIG DATA CONFERENCE · 2017-12-14 · APACHE BIG DATA CONFERENCE How to transform data into money using Big Data technologies. ... 3 DISTRIBUTED PROCESSING SOLUTION APACHE

4 HARD TO READ

SOLUTIONSCALA + APACHE SPARK

Page 30: APACHE BIG DATA CONFERENCE · 2017-12-14 · APACHE BIG DATA CONFERENCE How to transform data into money using Big Data technologies. ... 3 DISTRIBUTED PROCESSING SOLUTION APACHE

4 HARD TO READ

Reducing developing time

LOCs dramatically reduced

Number of classes dramatically reduced

Page 31: APACHE BIG DATA CONFERENCE · 2017-12-14 · APACHE BIG DATA CONFERENCE How to transform data into money using Big Data technologies. ... 3 DISTRIBUTED PROCESSING SOLUTION APACHE

Tests and application readability improvements

DSLs make our lives easier

Spark makes Map Reduces jobs even simpler

4 HARD TO READ

Page 32: APACHE BIG DATA CONFERENCE · 2017-12-14 · APACHE BIG DATA CONFERENCE How to transform data into money using Big Data technologies. ... 3 DISTRIBUTED PROCESSING SOLUTION APACHE

5 RECOGNIZED PATTERNS

SOLUTIONAPACHE SPARK

MLLIB

Page 33: APACHE BIG DATA CONFERENCE · 2017-12-14 · APACHE BIG DATA CONFERENCE How to transform data into money using Big Data technologies. ... 3 DISTRIBUTED PROCESSING SOLUTION APACHE

Millons of data processed in order to obtain mathematical models

Applied complex mathematical algorithms to obtain accurate weekly behaviors

5 RECOGNIZED PATTERNS

Page 34: APACHE BIG DATA CONFERENCE · 2017-12-14 · APACHE BIG DATA CONFERENCE How to transform data into money using Big Data technologies. ... 3 DISTRIBUTED PROCESSING SOLUTION APACHE

THANK YOU

UNITED STATES

Tel: (+1) 408 5998830

EUROPE

Tel: (+34) 91 828 64 73

[email protected]

www.stratio.com