Wiring open source big data
technologies
Paolo Platter @ AgileLab - Luca Barzè @ SIA
#redhatosd https://github.com/agile-lab-dev/wasp
Agile Lab
BigData Projects - Advisory services - Trainings – Analytics outsourcing
BUSINESS HIGHLIGHTS:
European leader in the areas of payments, cards, network services and capital markets
Founded in July 1977
Innovative network applications for banks
and businesses
RTGS Advanced collection and payment
services
ATM/POS terminal management
Front-end services for companies and
P.A.
Innovative technology solutions
for marketing
Consulting
CORPORATE INFORMATION (SIA GROUP)
Employees:1,612
Revenues: 449.4 million €
NETWORK
358.2terabyte of data carried
100%availability
180,000 km of network
PUBLIC SECTOR BODIES
CORPORATESFINANCIAL
INSTITUTIONSCAPITAL MARKETS
CENTRAL INSTITUTIONS
CARDS
3.9 billionoperations
747,325merchants managed
65.1 million credit, debit and prepaid cards
PAYMENTS
2.8 billiontransactions
1,154customers
INSTITUTIONAL SERVICES
41.7 billion financial transactions
over 100 brokers and traders in 18 countriesadopting compliance & surveillance systems
350 million deal proposals handled daily
New technologies: in memory computing, NoSQL, message broker, distributed computing framework …
New business opportunities: from reactive to a proactive approach to interactions with clients or suppliers
Users expect everything to be online and immediately usable
Many projects started, each one with its own technical stack
New and huge sources of dataIn the recent years:
Big Data: From Batch to Streaming
Log Management
Financial Data
Advertisement
Legacy Systems Offloading
Fleet Management
IoT
Click Stream
See talk SIA, instant payments
RDBMS –File systems –
…
SlowDifficult to add new features(analytics, machine learning,…)
FastPossible to add new features
«Big Data»
SO
URCE (
file
s)
From cold to hot dataArchiving
STORAGE
Fast enough for analytics, BI, …
JOB
RDBMS?Not for big amount of data
«HADOOP»? Designed for batch jobs
A typical high-level architecture
MESSAGING: decoupling, spikes, …
EXECUTION: computing cluster framework
OPS METRICSWEB APPLICATIONS
APPLICATION LOGS EVENT STREAMS
predictive modelingupdated analytics
data transformationsreal time alerting
…
Fast access to both historical and (near)-real time massive
data on the flyML
ENGINE
…
feeding more storages from the same source
periodically train machine learning models
deliver messages at least or exactly once
apply schemas to unstructured data
DataAnalytics
Business use cases
integrate different technologies
Value
You want:You have to:
With a custom architecture
WASP is a just released open-source framework to build complex real time big data applications.
It relies on a kind of Kappa/Lambda architecture mainlyleveraging Kafka and Spark, giving you a standard reference architecture.
If you need to ingest huge amount of heterogeneous data and analyze them throughcomplex pipelines, this is the framework for you.
BATCHMODULE
WASP
MESSAGING LAYER
PROCESSING LAYER
STORAGE LAYER
RT API
MODELSERVER
STREAMING
SCHEMAREGISTRY
WASP PRODUCER
SOURCE
WASP PRODUCER
TOPIC 1TOPIC n TOPIC m
BLKafka writer
BLKafka writer
Index writer Raw
writer
SOURCE 1
STREAMING
SCHEMAREGISTRY
BATCHMODULE
WASP PRODUCER
SOURCE 2
TOPIC 2
BL KeyValue
writer
Deploy
Contributors and Adopters