big data analytics without hadoop? - amsonline.de · big data analytics without hadoop? by dr....

23
Big Data Analytics without Hadoop? by Dr. Bernhard Sünder, Managing Director, AMS GmbH

Upload: dinhdang

Post on 02-Jul-2018

221 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Big Data Analytics without Hadoop? - amsonline.de · Big Data Analytics without Hadoop? by Dr. Bernhard Sünder, Managing Director, AMS GmbH

Big Data Analytics without Hadoop?by Dr. Bernhard Sünder, Managing Director, AMS GmbH

Page 2: Big Data Analytics without Hadoop? - amsonline.de · Big Data Analytics without Hadoop? by Dr. Bernhard Sünder, Managing Director, AMS GmbH

20.06.2017 Big Data Analytics without Hadoop? Dr. B. Sünder, AMS GmbH 2

AMS GmbHlocated in Chemnitz (Saxonia)founded 1993 by Dr. B. Sünder

Since 1998 our vision is:

Using Internet Technologiesfor distributed Work-Flows inMeasurement Data Post Processing

Page 3: Big Data Analytics without Hadoop? - amsonline.de · Big Data Analytics without Hadoop? by Dr. Bernhard Sünder, Managing Director, AMS GmbH

20.06.2017 Big Data Analytics without Hadoop? Dr. B. Sünder, AMS GmbH 3

Why we are here?

1. We are working with Measurement Data

2. Amount of data grows extremely1. Content of data files grow

1. >20,000 channels per file

2. >1GB per channel 20TB

2. The number of files grow (EvoBus: 100,000 files per month)

3. Old Microsoft desktop technologies are no solution1. Windows files system as a data base

2. Windows desktop tools for analysis and reporting

Page 4: Big Data Analytics without Hadoop? - amsonline.de · Big Data Analytics without Hadoop? by Dr. Bernhard Sünder, Managing Director, AMS GmbH

20.06.2017 Big Data Analytics without Hadoop? Dr. B. Sünder, AMS GmbH 4

Are Big-Data Technologies a Solution?

From Silicon Valley a lot of Big-Data technologiesflood the market

1. Hadoop MapReduce

2. HDFS: Hadoop Distributed File System

3. Lucene / Elasticsearch: Data base with indexing technology

4. Parquet: Data File Format

5. Tableau: Analysis and Visualization

Plus a lot of derived software

Page 5: Big Data Analytics without Hadoop? - amsonline.de · Big Data Analytics without Hadoop? by Dr. Bernhard Sünder, Managing Director, AMS GmbH

20.06.2017 Big Data Analytics without Hadoop? Dr. B. Sünder, AMS GmbH 5

Big Data vs Big Test Data

• Big Data is used mainly for Business / Office data

Google data

Amazon data

• But is it useful for Measurement data?

• What are the differences of both?

Page 6: Big Data Analytics without Hadoop? - amsonline.de · Big Data Analytics without Hadoop? by Dr. Bernhard Sünder, Managing Director, AMS GmbH

20.06.2017 Big Data Analytics without Hadoop? Dr. B. Sünder, AMS GmbH 6

• Millions of files, which are naturally slicedno further Hadoop slicing necessary

• Quantities versus Numbers

• Analysis functions which need up to 100% overlapping

• Meta Data definitions describe use case

• Files formats, with 20 years expertise

Big Test Data for Measurement Data

Page 7: Big Data Analytics without Hadoop? - amsonline.de · Big Data Analytics without Hadoop? by Dr. Bernhard Sünder, Managing Director, AMS GmbH

20.06.2017 Big Data Analytics without Hadoop? Dr. B. Sünder, AMS GmbH 7

• Slicing files in independent parts Not needed, we have files for each test

• Process each part independent Parallelism is the only way for performance

• Aggregate individual results to a common result Several aggregation methods are needed

Hadoop: MapReduce

Page 8: Big Data Analytics without Hadoop? - amsonline.de · Big Data Analytics without Hadoop? by Dr. Bernhard Sünder, Managing Director, AMS GmbH

20.06.2017 Big Data Analytics without Hadoop? Dr. B. Sünder, AMS GmbH 8

• 2016: due to replication only cheap storage– Comparison of IT-managed storage with Saturn HD

• 2017: IT-Managed HDFS storage same price– High dynamics in statements

• Replication of data (factor of 3)– Needs a factor of 3 more storage capacity

HDFS: Hadoop Distributed File System

Page 9: Big Data Analytics without Hadoop? - amsonline.de · Big Data Analytics without Hadoop? by Dr. Bernhard Sünder, Managing Director, AMS GmbH

20.06.2017 Big Data Analytics without Hadoop? Dr. B. Sünder, AMS GmbH 9

• Lucene: a big step, but limited index space– Great advantages compared with RDBs.

• Elasticsearch: distributed Lucene, no limit– 95% technology

• Full text search: intuitive vague search

• Faceted search: select from a given list

Indexing Database: Lucene or Elasticsearch

Page 10: Big Data Analytics without Hadoop? - amsonline.de · Big Data Analytics without Hadoop? by Dr. Bernhard Sünder, Managing Director, AMS GmbH

20.06.2017 Big Data Analytics without Hadoop? Dr. B. Sünder, AMS GmbH 10

• The Parquet Introduction on Apache Website:“We invented the column based storage”

• Alternate Facts!

• In the Measurement data world column based storage exists since 20 years. (ATF(X), DAT, …)

• We see no relevance of such a new file format

• Just use the existing data file formats,even if you use a HDFS file system

Parquet: Data File Format

Page 11: Big Data Analytics without Hadoop? - amsonline.de · Big Data Analytics without Hadoop? by Dr. Bernhard Sünder, Managing Director, AMS GmbH

20.06.2017 Big Data Analytics without Hadoop? Dr. B. Sünder, AMS GmbH 11

• Tableau has a fantastic nice user interface

• There is only one feature: Pivot Analysis

• Ideal for row based sales data (see Excel)

• But for measurement data it is only oneout of hundred calculations

• A good analysis tools has Pivot available, too

Tableau: Analysis and Visualization

Page 12: Big Data Analytics without Hadoop? - amsonline.de · Big Data Analytics without Hadoop? by Dr. Bernhard Sünder, Managing Director, AMS GmbH

20.06.2017 Big Data Analytics without Hadoop? Dr. B. Sünder, AMS GmbH 12

Technology Big-Data Big-Test-Data

Hadoop MapReduce ++ -

Hadoop Distributed File System ++ + (?)

Parquet + -

Tableau + -

Elasticsearch /Lucene ++ ++

Conclusion on Big-Data Technologies

Page 13: Big Data Analytics without Hadoop? - amsonline.de · Big Data Analytics without Hadoop? by Dr. Bernhard Sünder, Managing Director, AMS GmbH

20.06.2017 Big Data Analytics without Hadoop? Dr. B. Sünder, AMS GmbH 13

Measurement Data Management

Windows,Linux,Mac, iOS,Android

optimized traffic with

iPad

jBEAM

Server

MaDaM

Elasticsearch database

Web

Service

jBEAM

pure HTML-5

by

InteractiveAnalysis

ATFX,MDF4,

Test

Simulation

MaDaMImporterjBEAM

iBEAMClient

jBEAMClient

InteractiveAnalysisjBEAM

Desktop

MDF4

Search & FindStandard Reports

Page 14: Big Data Analytics without Hadoop? - amsonline.de · Big Data Analytics without Hadoop? by Dr. Bernhard Sünder, Managing Director, AMS GmbH

20.06.2017 Big Data Analytics without Hadoop? Dr. B. Sünder, AMS GmbH 14

• All the Hadoop technologies are availableas Java-libraries

• MaDaM and jBEAM are both Java tools

• Using Big Data-technologies is easy for us

• Today we can show you how to read MDF files

from a HDFS

MaDaM and jBEAM:the partners for Big-Data technologies

Page 15: Big Data Analytics without Hadoop? - amsonline.de · Big Data Analytics without Hadoop? by Dr. Bernhard Sünder, Managing Director, AMS GmbH

20.06.2017 Big Data Analytics without Hadoop? Dr. B. Sünder, AMS GmbH 15

ParallelizationLE

VEL

3

multiple data lakes with multiple MDMsShanghai Stuttgart Detroit

User

N x jBEAM

Cluster

one data lake with multiple jBEAMs

LEV

EL 2

jBEAM

Server

jBEAM

Client

LEV

EL 1

multiple threads withinone jBEAMcalculation

Datachannel

Calculation

6.377.228.23

11.6712.5413.83

14.8915.4116.60

Result

Split Aggregation

CalcT a

CalcT b

CalcT c

236.87

597.22

618.23

jBEAM

Page 16: Big Data Analytics without Hadoop? - amsonline.de · Big Data Analytics without Hadoop? by Dr. Bernhard Sünder, Managing Director, AMS GmbH

20.06.2017 Big Data Analytics without Hadoop? Dr. B. Sünder, AMS GmbH 16

jBEAM-Cluster – Parallel Processing

Aggregated Reports:jBEAM-generated PDF files

N-jBEAMs are running in a cluster and analyzing file by file.Node results are aggregated to a common result.

Page 17: Big Data Analytics without Hadoop? - amsonline.de · Big Data Analytics without Hadoop? by Dr. Bernhard Sünder, Managing Director, AMS GmbH

20.06.2017 Big Data Analytics without Hadoop? Dr. B. Sünder, AMS GmbH 17

Multi File Operation Mode Analysis (I)

Page 18: Big Data Analytics without Hadoop? - amsonline.de · Big Data Analytics without Hadoop? by Dr. Bernhard Sünder, Managing Director, AMS GmbH

20.06.2017 Big Data Analytics without Hadoop? Dr. B. Sünder, AMS GmbH 18

Multi File Operation Mode Analysis (II)

Page 19: Big Data Analytics without Hadoop? - amsonline.de · Big Data Analytics without Hadoop? by Dr. Bernhard Sünder, Managing Director, AMS GmbH

20.06.2017 Big Data Analytics without Hadoop? Dr. B. Sünder, AMS GmbH 19

1e-6 = 0.0001% Only meta dataexchange

1e-3 = 0.1% EnCom-minimized traffic

1e0 = 100% Complete file upload

*) optimized traffic with

long

dis

tance

Multiple-MaDaM Solution

IP traffic1e-6

Serv

erC

lien

t

USA

*)1e-3

WebBrowser

jBEAM

Client

MaDaM

Importer

File System

1e0≙100%

jBEAM

ServerHTML-5

MaDaMTM

Lucene Database

long

dis

tance

Germany

*)

WebBrowser

jBEAM

Client

MaDaM

Importer

File System

jBEAM

ServerHTML-5

MaDaMTM

Lucene Database

China

*)

WebBrowser

jBEAM

Client

MaDaM

Importer

File System

jBEAM

ServerHTML-5

MaDaMTM

Lucene Database

Search for Tests & Preview: Modern interactive web interface accessible by anybrowser

StandardizedReports:Server-jBEAM-generated PDF files can be viewed by PDF-Reader

WebBrowser

jBEAM

Client

MaDaM

Importer

Interactive Analysis: jBEAM with Java Web Start running on clientdesktop

Import of Tests:MaDaM Importer with Java Web Start running on client desktop

Page 20: Big Data Analytics without Hadoop? - amsonline.de · Big Data Analytics without Hadoop? by Dr. Bernhard Sünder, Managing Director, AMS GmbH

20.06.2017 Big Data Analytics without Hadoop? Dr. B. Sünder, AMS GmbH 20

• Using the right technologies from Big Data

• Combine it with sophisticated technologies from the measurement world

• And you receive the right solution for

Big Test Data

Conclusion

Page 21: Big Data Analytics without Hadoop? - amsonline.de · Big Data Analytics without Hadoop? by Dr. Bernhard Sünder, Managing Director, AMS GmbH

20.06.2017 Big Data Analytics without Hadoop? Dr. B. Sünder, AMS GmbH 21

Come to our booth #1822

• The brand new MaDaM2

– Elasticsearch

– Easy & new user-interface

• Hadoop (HDFS)

– ASAM-MDF file import

• jBEAM-Cluster

– 6 cheap PCs working in parallel

NAS

Page 22: Big Data Analytics without Hadoop? - amsonline.de · Big Data Analytics without Hadoop? by Dr. Bernhard Sünder, Managing Director, AMS GmbH

20.06.2017 Big Data Analytics without Hadoop? Dr. B. Sünder, AMS GmbH 22

… and if you are lucky

Win a local flight around Stuttgart with our business plane.

We will start directly after exhibition closes; fromStuttgart airport.

Page 23: Big Data Analytics without Hadoop? - amsonline.de · Big Data Analytics without Hadoop? by Dr. Bernhard Sünder, Managing Director, AMS GmbH

Bahnhofstraße 6 1760 Opdyke Court German Centre, Unit 719A09111 Chemnitz Auburn Hills, MI 48326 88 Keyuan Road, PudongGermany USA Shanghai 201203 / PR China

Tel.: +49 (371) 918 668-0 Tel.: +1 (248) 270-7779 Tel.: +86 (21) 289 866 19Fax.: +49 (371) 918 668-99 Fax: +1 (248) 393-0340 Fax: +86 (21) 289 865 11E-Mail: [email protected] E-Mail: [email protected] E-Mail: [email protected]: www.AMSonline.de Web: www.AMSonline.eu Web: www.AMSonline.cn

Gesellschaft für angewandteMess- und Systemtechnik mbH North America Inc. Liaison Office Shanghai