proteus h2020 at ficloud2016

23
An incremental approach for real-time Big Data visual analytics Nacho Garc´ ıa Fern´ andez Treelogic S.L. [email protected] August 23, 2016 Nacho Garc´ ıa Fern´ andez (Treelogic S.L.) BigR&I 2016 August 23, 2016 1 / 23

Upload: nacho-garcia-fernandez

Post on 22-Jan-2018

385 views

Category:

Engineering


0 download

TRANSCRIPT

Page 1: PROTEUS H2020 at Ficloud2016

An incremental approach for real-time Big Data visualanalytics

Nacho Garcıa Fernandez

Treelogic S.L.

[email protected]

August 23, 2016

Nacho Garcıa Fernandez (Treelogic S.L.) BigR&I 2016 August 23, 2016 1 / 23

Page 2: PROTEUS H2020 at Ficloud2016

About me

Academics

BSc in Computer Science

MSc in Computer Science

PhD Student

Professional

R&D Engineer at Treelogic S.L

Lecturer at Master of Big Data (KSchool)

Others

Computer security enthusiast

Nacho Garcıa Fernandez (Treelogic S.L.) BigR&I 2016 August 23, 2016 2 / 23

Page 3: PROTEUS H2020 at Ficloud2016

About my company: Treelogic

R&D intensive company with the mission of adapting technologicalknowledge to improve quality standards in our daily life

8 ongoing H2020 projects (coordinating 3 of them)

8 ongoing FP7 projects (coordinating 5 of them)

Focused on providing Big Data Analytics in all the world

Internal Organisation

Research Lines

Big Data

Computer Vision

Data Science

Social Media Analysis

Security

ICT Solutions

Security & Safety

Justice

Health

Transport

Financial Services

ICT tailored solutions

Nacho Garcıa Fernandez (Treelogic S.L.) BigR&I 2016 August 23, 2016 3 / 23

Page 4: PROTEUS H2020 at Ficloud2016

Overview

1 Introduction

2 State-of-the-art

3 PROTEUS

4 Conclusions

5 Work in progress

Nacho Garcıa Fernandez (Treelogic S.L.) BigR&I 2016 August 23, 2016 4 / 23

Page 5: PROTEUS H2020 at Ficloud2016

Outline

1 Introduction

2 State-of-the-art

3 PROTEUS

4 Conclusions

5 Work in progress

Nacho Garcıa Fernandez (Treelogic S.L.) BigR&I 2016 August 23, 2016 5 / 23

Page 6: PROTEUS H2020 at Ficloud2016

PROTEUS

PROTEUS: Scalable Online Machine Learning and Real-TimeInteractive Visual Analytics

Funding program: H2020 project

Duration: 36 months (Dic, 2016 - Nov, 2018)

Consortium: Treelogic (Coordinator), ArcelorMittal, DFKI, Novelti,Bournemouth University, Trilateral Research

What is PROTEUS about? Its mission is to investigate and

develop ready-to-use scalable online machine learning algorithms and

real-time interactive visual analytics to deal with extremely large data

sets and data streams.

Nacho Garcıa Fernandez (Treelogic S.L.) BigR&I 2016 August 23, 2016 6 / 23

Page 7: PROTEUS H2020 at Ficloud2016

Content of this talk

What this talk is aboutBig Data scalable architectureReal-time processingIncremental processingReal-time visualization

What this talk is not aboutMachine Learning

Nacho Garcıa Fernandez (Treelogic S.L.) BigR&I 2016 August 23, 2016 7 / 23

Page 8: PROTEUS H2020 at Ficloud2016

Outline

1 Introduction

2 State-of-the-art

3 PROTEUS

4 Conclusions

5 Work in progress

Nacho Garcıa Fernandez (Treelogic S.L.) BigR&I 2016 August 23, 2016 8 / 23

Page 9: PROTEUS H2020 at Ficloud2016

Big Data : Introduction

Nacho Garcıa Fernandez (Treelogic S.L.) BigR&I 2016 August 23, 2016 9 / 23

Page 10: PROTEUS H2020 at Ficloud2016

Big Data: real-time processing & visualization

Big Data real-time processing engines

They are usually general-purpose data analytics systems

Di↵erent stream process approaches: micro-batches, flexiblewindows, etc.

They provide multi-purpose libraries that work on top ofthem

They are also compatible with many distributed data sources:HBase, Apache Kafka, RabbitMQ, S3, etc.

Open source

Big Data Visual Analytics

Provide connectors to access and visualize previously storeddata from (almost) anywhere

Allow users to create and customize their dashboards with apredefined set of charts

Make data understandable so decisions can be driven by data

Nacho Garcıa Fernandez (Treelogic S.L.) BigR&I 2016 August 23, 2016 10 / 23

Page 11: PROTEUS H2020 at Ficloud2016

Big Data and visual analytics challenges

Big data processing frameworks

Processing time still depends on data volume

You need to write two di↵erent programs: back-end (big data) andfront-end (visualization)

Visual Analytics tools

Non-customizable visualization methods

Non-Open Source licenses

Most of them require to move data to the cloud

Installation and configuration process: rocket science

Bad performance with Big Data

Nacho Garcıa Fernandez (Treelogic S.L.) BigR&I 2016 August 23, 2016 11 / 23

Page 12: PROTEUS H2020 at Ficloud2016

Outline

1 Introduction

2 State-of-the-art

3 PROTEUS

4 Conclusions

5 Work in progress

Nacho Garcıa Fernandez (Treelogic S.L.) BigR&I 2016 August 23, 2016 12 / 23

Page 13: PROTEUS H2020 at Ficloud2016

Solution?

Nacho Garcıa Fernandez (Treelogic S.L.) BigR&I 2016 August 23, 2016 13 / 23

Page 14: PROTEUS H2020 at Ficloud2016

PROTEUS: Architecture

Enables you to process & visualize data in real timeAlmost the whole program is written in the backend

Component-based architectureData collectorIncremental processing engineReal-time visualization library

It is based on existing solutions (Apache Kafka, Apache Flink andD3).

Nacho Garcıa Fernandez (Treelogic S.L.) BigR&I 2016 August 23, 2016 14 / 23

Page 15: PROTEUS H2020 at Ficloud2016

PROTEUS: Incremental algorithms

Allow end-users to obtain results in real-time

Avoid recomputing whole data volumes after every small change

Create very interactive applications

Allow decision making in real-time

Nacho Garcıa Fernandez (Treelogic S.L.) BigR&I 2016 August 23, 2016 15 / 23

Page 16: PROTEUS H2020 at Ficloud2016

PROTEUS: Data collector

It is in charge of collecting data from di↵erentsources.

When new data is generated/available, a KafkaProducer stores it in a distributed Kafka cluster.

Serializes, compresses and encrypts data, if needed.

Splits data streams into chunks and generateswindows

Nacho Garcıa Fernandez (Treelogic S.L.) BigR&I 2016 August 23, 2016 16 / 23

Page 17: PROTEUS H2020 at Ficloud2016

PROTEUS: Incremental processing engine

Receives data in chunks from the previouscomponent

It performs an incremental operation for each data

chunkArithmetics & Statistics: sum, multiply, divide, average,covariance, pearson correlation, etc.Others: Sorting, cleaning, filtering, etc.

Returns an output for each data chunk

Nacho Garcıa Fernandez (Treelogic S.L.) BigR&I 2016 August 23, 2016 17 / 23

Page 18: PROTEUS H2020 at Ficloud2016

PROTEUS: Real-time visualization library

Receives outcomes from the previous component

Enables data visualization in real time

Allows users to easily interact and explore data

Wide range of graphsClassical charts: Linechart, Barchart, Piechart, etc.Novel charts: Streamgraph, Swimlane, Gauge, Sunburst

Nacho Garcıa Fernandez (Treelogic S.L.) BigR&I 2016 August 23, 2016 18 / 23

Page 19: PROTEUS H2020 at Ficloud2016

Outline

1 Introduction

2 State-of-the-art

3 PROTEUS

4 Conclusions

5 Work in progress

Nacho Garcıa Fernandez (Treelogic S.L.) BigR&I 2016 August 23, 2016 19 / 23

Page 20: PROTEUS H2020 at Ficloud2016

Conclusions

Allows users to easily write back-end and front-end programs all in one

It allows users to not only learn the final result, but also to visualizeintermediate outcomes.

Enable decision making before knowing the final resultReal-time data exploration and visualization

Open-source solution1. Feel free to contribute!

1http://github.com/proteus-h2020

Nacho Garcıa Fernandez (Treelogic S.L.) BigR&I 2016 August 23, 2016 20 / 23

Page 21: PROTEUS H2020 at Ficloud2016

Outline

1 Introduction

2 State-of-the-art

3 PROTEUS

4 Conclusions

5 Work in progress

Nacho Garcıa Fernandez (Treelogic S.L.) BigR&I 2016 August 23, 2016 21 / 23

Page 22: PROTEUS H2020 at Ficloud2016

Future Work

Work in progress

Extend the current incremental operationsInclude not only arithmetics and statistic operations, but also onlinemachine learning techniques.Anomaly detection

Extend the visualization library.Include new interactive chartsSupport Canvas image renderingResearch in new visualization methods and techniques

Nacho Garcıa Fernandez (Treelogic S.L.) BigR&I 2016 August 23, 2016 22 / 23

Page 23: PROTEUS H2020 at Ficloud2016

That’s all!

Contact us: [email protected]

0xNacho

0xNacho

Nacho Garcıa Fernandez (Treelogic S.L.) BigR&I 2016 August 23, 2016 23 / 23