wendelin exanalytics2020 big data with mariadb...© 2014 wendelin project et al. – cc sa-nc erp5...

26
© 2014 Wendelin Project et al. – CC SA-NC Wendelin Exanalytics 2020 Big Data with MariaDB 2014-04-03 – Santa Clara www.wendelin.io

Upload: others

Post on 31-Dec-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Wendelin Exanalytics2020 Big Data with MariaDB...© 2014 Wendelin Project et al. – CC SA-NC ERP5 MariaDB NEO Python ERP5 Web Workflow HR D o c u m e n t M a n a g e m e n t Supply

© 2014 Wendelin Project et al. – CC SA-NC

Wendelin Exanalytics2020 Big Data with MariaDB

2014-04-03 – Santa Clara

www.wendelin.io

Page 2: Wendelin Exanalytics2020 Big Data with MariaDB...© 2014 Wendelin Project et al. – CC SA-NC ERP5 MariaDB NEO Python ERP5 Web Workflow HR D o c u m e n t M a n a g e m e n t Supply

© 2014 Wendelin Project et al. – CC SA-NC

Agenda

● Our background: ERP5

● Our future: Wendelin Exanalytics

● Our challenge: out-of-core

Page 3: Wendelin Exanalytics2020 Big Data with MariaDB...© 2014 Wendelin Project et al. – CC SA-NC ERP5 MariaDB NEO Python ERP5 Web Workflow HR D o c u m e n t M a n a g e m e n t Supply

© 2014 Wendelin Project et al. – CC SA-NC

ERP5

MariaDB

NEO

Python

ERP5

Web Workflow

HR

Do

cum

ent

Man

agem

ent

Supply ChainFinance

MRP

Customisation

Fine Grain SecurityFull TraceabilityScalability

FlexibilityRapid prototypingZope TTW on steroids

BankingAerospaceHealthChemicalGovernmentNGOCloud ComputingConsultingMechanical

Online contribution for 3rd parties To-do listsNotifications

Careers and assignmentsPayrollProjects

CR

M

Page 4: Wendelin Exanalytics2020 Big Data with MariaDB...© 2014 Wendelin Project et al. – CC SA-NC ERP5 MariaDB NEO Python ERP5 Web Workflow HR D o c u m e n t M a n a g e m e n t Supply

© 2014 Wendelin Project et al. – CC SA-NC

Terra-SAR X Satellite

Accessible to Airbuspartners and distributors

Interfaces with DLR(Germany Space Agency)

« With ERP5, our partners all over the world can access our infrastructure and order online with complete security “ Ralf Duering

Management of sales and production of images

Compliant with ESA

standard (ECSS)

Page 5: Wendelin Exanalytics2020 Big Data with MariaDB...© 2014 Wendelin Project et al. – CC SA-NC ERP5 MariaDB NEO Python ERP5 Web Workflow HR D o c u m e n t M a n a g e m e n t Supply

© 2014 Wendelin Project et al. – CC SA-NC

SANEF Group

« Web has become our primary sales channel. » Frédéric Charlier

Online sales and customer relation for ETC Tolling

120.000 new customers / year

51.000 invoice/hour7.000.000 contacts / year

250 users

Implemented in 4 months

Page 6: Wendelin Exanalytics2020 Big Data with MariaDB...© 2014 Wendelin Project et al. – CC SA-NC ERP5 MariaDB NEO Python ERP5 Web Workflow HR D o c u m e n t M a n a g e m e n t Supply

© 2014 Wendelin Project et al. – CC SA-NC

Open Source ERP/CRM for S&P 100

Page 7: Wendelin Exanalytics2020 Big Data with MariaDB...© 2014 Wendelin Project et al. – CC SA-NC ERP5 MariaDB NEO Python ERP5 Web Workflow HR D o c u m e n t M a n a g e m e n t Supply

© 2014 Wendelin Project et al. – CC SA-NC

Agenda

● Our background: ERP5

● Our future: Wendelin Exanalytics

● Our challenges with MariaDB

Page 8: Wendelin Exanalytics2020 Big Data with MariaDB...© 2014 Wendelin Project et al. – CC SA-NC ERP5 MariaDB NEO Python ERP5 Web Workflow HR D o c u m e n t M a n a g e m e n t Supply

© 2014 Wendelin Project et al. – CC SA-NC

Take the Best Analytics scikit-learn.org

Page 9: Wendelin Exanalytics2020 Big Data with MariaDB...© 2014 Wendelin Project et al. – CC SA-NC ERP5 MariaDB NEO Python ERP5 Web Workflow HR D o c u m e n t M a n a g e m e n t Supply

© 2014 Wendelin Project et al. – CC SA-NC

Made by Great Mathematicianshttp://en.wikipedia.org/wiki/Fields_Medal

Wendelin Werner

Page 10: Wendelin Exanalytics2020 Big Data with MariaDB...© 2014 Wendelin Project et al. – CC SA-NC ERP5 MariaDB NEO Python ERP5 Web Workflow HR D o c u m e n t M a n a g e m e n t Supply

© 2014 Wendelin Project et al. – CC SA-NC

Add Distributed Storage neoppod.org NEO

Page 11: Wendelin Exanalytics2020 Big Data with MariaDB...© 2014 Wendelin Project et al. – CC SA-NC ERP5 MariaDB NEO Python ERP5 Web Workflow HR D o c u m e n t M a n a g e m e n t Supply

© 2014 Wendelin Project et al. – CC SA-NC

Add Elastic PaaS erp5.com

# Initialize datadata_size = 1000000server_count = 1000chunk_size = data_size / server_countdata = array(data_size)

# Process data in parallel on each server (Map Reduce, Batch, etc.)for server in server_count: data.activate().process(server*chunk_size, chunk_size)

PaaS

Page 12: Wendelin Exanalytics2020 Big Data with MariaDB...© 2014 Wendelin Project et al. – CC SA-NC ERP5 MariaDB NEO Python ERP5 Web Workflow HR D o c u m e n t M a n a g e m e n t Supply

© 2014 Wendelin Project et al. – CC SA-NC

And Multicloud Deployment slapos.org

MMC Rus

Page 13: Wendelin Exanalytics2020 Big Data with MariaDB...© 2014 Wendelin Project et al. – CC SA-NC ERP5 MariaDB NEO Python ERP5 Web Workflow HR D o c u m e n t M a n a g e m e n t Supply

© 2014 Wendelin Project et al. – CC SA-NC

Wendelin Exanalytics Core 100% open source

NEO

SlapOS

Scikit Learn

ERP5

Multicloud Deployment

Elastic PaaS

Distributed Storage

Data Analytics

Multi Data Center

10

0% P

yth

on

Page 14: Wendelin Exanalytics2020 Big Data with MariaDB...© 2014 Wendelin Project et al. – CC SA-NC ERP5 MariaDB NEO Python ERP5 Web Workflow HR D o c u m e n t M a n a g e m e n t Supply

© 2014 Wendelin Project et al. – CC SA-NC

Wendelin User Interface renderjs.org

Page 15: Wendelin Exanalytics2020 Big Data with MariaDB...© 2014 Wendelin Project et al. – CC SA-NC ERP5 MariaDB NEO Python ERP5 Web Workflow HR D o c u m e n t M a n a g e m e n t Supply

© 2014 Wendelin Project et al. – CC SA-NC

Wendelin Options 100% open source

Time sequence processingDataPad / JP Morgan

JIT compiler / type inferenceContinuum / DARPA

Scikit Learn

Pandas

Numba / Parakeet

NEO

10

0% P

yth

on

Blaze Full out-of-core arraysContinuum / DARPA

Reatime log collectionTreasure Data / AmazonFluentd

NLTK Natural Language TookitU. Texas / Chalmers

Page 16: Wendelin Exanalytics2020 Big Data with MariaDB...© 2014 Wendelin Project et al. – CC SA-NC ERP5 MariaDB NEO Python ERP5 Web Workflow HR D o c u m e n t M a n a g e m e n t Supply

© 2014 Wendelin Project et al. – CC SA-NC

Wendelin Applications● Intrusion detection

● Fraud detection

● Business and economic prevision

● Marketing

● Media analysis

● Public security

● Brain Computer Interface

● Internet Of Things

Page 17: Wendelin Exanalytics2020 Big Data with MariaDB...© 2014 Wendelin Project et al. – CC SA-NC ERP5 MariaDB NEO Python ERP5 Web Workflow HR D o c u m e n t M a n a g e m e n t Supply

© 2014 Wendelin Project et al. – CC SA-NC

Business Model: German Style No VC

Nexedi (WendelinCo)

Scikit Learn

Big Data System User

Extension 1

Big Data System Supplier

Extension 2

100% open source hardware

100%

1 - 10% proprietary

Page 18: Wendelin Exanalytics2020 Big Data with MariaDB...© 2014 Wendelin Project et al. – CC SA-NC ERP5 MariaDB NEO Python ERP5 Web Workflow HR D o c u m e n t M a n a g e m e n t Supply

© 2014 Wendelin Project et al. – CC SA-NC

Agenda

● Our background: ERP5

● Our future: Wendelin Exanalytics

● Our challenge: out-of-core

Page 19: Wendelin Exanalytics2020 Big Data with MariaDB...© 2014 Wendelin Project et al. – CC SA-NC ERP5 MariaDB NEO Python ERP5 Web Workflow HR D o c u m e n t M a n a g e m e n t Supply

© 2014 Wendelin Project et al. – CC SA-NC

Out-of-core arrays

# Numpynp.ndarray(shape=(2,2), dtype=float, order='F')

# Out-of-core datanp.ndarray(shape=(1e18,2), dtype=float, order='F')

# Full out-of-corenp.ndarray(shape=(1e9,2e9), dtype=float, order='F')

1 Exabyte

1 Exabyte

Best out-of-core topology depends on the algorithm and array geometry

Page 20: Wendelin Exanalytics2020 Big Data with MariaDB...© 2014 Wendelin Project et al. – CC SA-NC ERP5 MariaDB NEO Python ERP5 Web Workflow HR D o c u m e n t M a n a g e m e n t Supply

© 2014 Wendelin Project et al. – CC SA-NC

neo.ndarray out-of-core data

neo.ndarray

1 2 3 4 5 6 7 8 9 10 11 12

5

9

6

10

7

11

1 2 3 4

8

12

Page 21: Wendelin Exanalytics2020 Big Data with MariaDB...© 2014 Wendelin Project et al. – CC SA-NC ERP5 MariaDB NEO Python ERP5 Web Workflow HR D o c u m e n t M a n a g e m e n t Supply

© 2014 Wendelin Project et al. – CC SA-NC

NEO Overview

neoctlSate accessCommand control

MasterOID & TID allocationSynchronisationLoad balancing

StorageObject dataTransaction dataPartition table

ApplicationZODBneo.client

DataControl

AdminState archivalCommand proxy

Page 22: Wendelin Exanalytics2020 Big Data with MariaDB...© 2014 Wendelin Project et al. – CC SA-NC ERP5 MariaDB NEO Python ERP5 Web Workflow HR D o c u m e n t M a n a g e m e n t Supply

© 2014 Wendelin Project et al. – CC SA-NC

NEO Overview

neoctlSate accessCommand control

MasterOID & TID allocationSynchronisationLoad balancing

StorageObject dataTransaction dataPartition table

ApplicationZODBneo.client

DataControl

AdminState archivalCommand proxy

Page 23: Wendelin Exanalytics2020 Big Data with MariaDB...© 2014 Wendelin Project et al. – CC SA-NC ERP5 MariaDB NEO Python ERP5 Web Workflow HR D o c u m e n t M a n a g e m e n t Supply

© 2014 Wendelin Project et al. – CC SA-NC

Object retrieval

Retrieve x : hash(x._p_oid)

S1 S2 S3

?Parition Node State

S1 IP:PORT ?

S2 IP:PORT Connected

S3 IP:PORT ?

Partition Node State

0S1

S3

1S2

S3

... ... ...

Variable

Variable

Page 24: Wendelin Exanalytics2020 Big Data with MariaDB...© 2014 Wendelin Project et al. – CC SA-NC ERP5 MariaDB NEO Python ERP5 Web Workflow HR D o c u m e n t M a n a g e m e n t Supply

© 2014 Wendelin Project et al. – CC SA-NC

Rodmap

● Q2 2014: neo.ndarray

● Q3 2014: developer release of Wendelin

● Q4 2014: neo.ndarray with simple optimizations

● Q1 2014: mariadb embedded

● Q2 2015: coloured caches

● Q3 2015: coloured caches with C client cache

● Q4 2015: GO storage

Page 25: Wendelin Exanalytics2020 Big Data with MariaDB...© 2014 Wendelin Project et al. – CC SA-NC ERP5 MariaDB NEO Python ERP5 Web Workflow HR D o c u m e n t M a n a g e m e n t Supply

© 2014 Wendelin Project et al. – CC SA-NC

Challenges

● Reduce latency → embedded mariadb ?

● Reduce SQL overhead → precompile queries ?

● Reduce copies → BLOB protocol ?

● Accelerate storage → C++ ? GO ?

● Optimize cache → colored caching

Page 26: Wendelin Exanalytics2020 Big Data with MariaDB...© 2014 Wendelin Project et al. – CC SA-NC ERP5 MariaDB NEO Python ERP5 Web Workflow HR D o c u m e n t M a n a g e m e n t Supply

© 2014 Wendelin Project et al. – CC SA-NC

Wendelin Exanalytics2020 Big Data with MariaDB

2014-04-03 – Santa Clara

www.wendelin.io