big data made easy in the era of the cloud - demi ben-ari

Post on 22-Jan-2018

29 Views

Category:

Software

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Big Data made easy in the era of CloudDemi Ben-Ari - VP R&D @ Panorays

About Me

Demi Ben-Ari, Co-Founder & VP R&D @ Panorays● Google Developer Expert● Co-Founder of Communities:

○ “Big Things” - Big Data, Data Science, DevOps○ Google Developer Group Cloud○ Ofek Alumni Association

In the Past:● Sr. Data Engineer - Windward● Team Leader & Sr. Java Software Engineer,

Missile defence and Alert System - “Ofek” – IAF

AutomatetheSecurityManagementofThirdParties

Capturethe Hacker’sView

GetRealtime Ratings

Complywith Regulations

Say “Distributed”, Say “Big Data”,Say….

What is Big Data (IMHO)? And What to Monitor?

● Systems involving the “3 Vs”:What are the right questions we want to ask?○ Volume - How much?○ Velocity - How fast?○ Variety - What kind? (Difference)

What had happened in the last years?

● Storage got cheaper● The capacity of Data grew exponentially● Cloud service providers grew rapidly● Connectivity got much easier● Cloud made “by demand” computation possible● “Compute” started moving to the “Data” and not the other way.

Situations & Problems

https://imgflip.com/i/1ap5krhttp://kingofwallpapers.com/otter/otter-004.jpg

MongoDB + Spark

Worker 1Worker 2

….….

……

Worker N

Spark Cluster

Master

WriteRead

MasterSharded MongoDB

Replica Set

Cassandra + Spark

Worker 1Worker 2

….….

……

Worker N

Cassandra Cluster

Spark Cluster

WriteRead

Cassandra + Serving

Cassandra Cluster

WriteRead

UI ClientUI Client

UI ClientUI Client

Web ServiceWeb

ServiceWeb ServiceWeb

Service

Distributed Microservices Architecture

Service A

Queue

DB

Service B

DBCache

Cache DBService C

Web Server

DB

Analytics Cluster

Master

Slave Slave Slave

Monitoring System???

Did someone say Containers?

Docker Environments

● Docker?

● Orchestration?

VS

● Wait, What about local mode? ○ Minikube vs Docker Engine

Problems

● Multiple physical servers● Multiple logical services● Want Scaling => More Servers

Data flow and Environment(Use Case)

Structure of the Data

● Maritime Analytics Platform● Geo Locations + Metadata ● Arriving over time● Different types of messages being reported by satellites ● Encoded (For compression reasons)● Might arrive later than actually transmitted

Data Flow Diagram

External Data

Source

Analytics Layers

Data Pipeline

Parsed Raw

Entity Resolution Process

Building insightson top of the entities

Data Output Layer

Anomaly Detection

Trends

UI for End Users

Environment Description

Cluster

Dev Testing Live Staging ProductionEnv

OB1K

RESTful Java Services

Monitoring Your Data

https://memegenerator.net/instance/53617544

Data Questions? What should be measure

● Did all of the computation occur?

○ Are there any data layers missing?● How much data do we have? (Volume)

● Is all of the data in the Database?

● Data Quality Assurance

Conclusions

● Keep all of the Data that you can● In its most raw form

● Duplicating Data is not a bad thing● By demand compute with save you much time and money● Find the relevant tool to solve each problem

● Not one tool that will solve all of them (No such thing)● Use the cloud as an auxiliary tool

● Will boost your productivity by much

Questions?

top related