open source big data landscape and possible its applications

Post on 15-Apr-2017

886 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Tomasz Szymański Adam Warski

SoftwareMill

Open source big data landscape and possible ITS applications

Big Data? Fast Data?• No clear definition• Big Data

– 100s+ of GB? – Time frame?

• Fast Data– Real-time– Single-node vs multi-node

Why Open Source?• Large developer base

Easy to learn• Projects usually backed by a commercial entity

Support• Cost efficiency

leverage latest developments• Future-proofing

tools with a large user base will be around for longer

Apache Spark / Cassandra / Kafka• Data ingestion: Kafka• Data processing: Spark• Data storage: Cassandra

Apache Spark / Cassandra / Kafka• Spark: largest cluster 8k nodes, eBay, Baidu, NASA, Amazon• Cassandra: over 75k nodes storing 10PB of data at Apple• Kafka: over 1.1 trillion messages per day at LinkedIn

Possible ITS applications

Hotspot detectionComputed using New York open taxi data, Akka & Apache Flink

Architecture of a traffic-jam detection systemLeveraging Apache Kafka, Hadoop, Spark, Cassandra & Akka

Summing up and the future• Open source has a lot to offer• Open data?• Fast-evolving field

– Rapid development, rapid data insights– Leverage in ITS!

technical expertise

‘s ITS domainexperts

• Founded in 2009• Bespoke software development services• Various domains, including logistics & transport• Big data a common theme in our projects

top related