apache apex meetup at cask

18
Thomas Weise <[email protected]> Dec 2 nd , 2015 Introduction to Open Source Unified Streaming and Fast Batching Platform Apache Apex

Upload: datatorrent

Post on 12-Jan-2017

95 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Apache Apex Meetup at Cask

Thomas Weise <[email protected]>Dec 2nd, 2015

Introduction to Open Source Unified Streaming and Fast Batching PlatformApache Apex

Page 2: Apache Apex Meetup at Cask

© 2015 DataTorrent2

Apex Platform Overview

Page 3: Apache Apex Meetup at Cask

© 2015 DataTorrent3

Apache Malhar Library

Page 4: Apache Apex Meetup at Cask

© 2015 DataTorrent4

Native Hadoop Integration

• YARN is the resource manager

• HDFS used for storing any persistent state

Page 5: Apache Apex Meetup at Cask

© 2015 DataTorrent5

Application Programming Model

A Stream is a sequence of data tuplesAn Operator takes one or more input streams, performs computations & emits one or more output streams

• Each Operator is YOUR custom business logic in java, or built-in operator from our open source library• Operator has many instances that run in parallel and each instance in single-threaded

Directed Acyclic Graph (DAG) is made up of operations and streams

Directed Acyclic Graph (DAG)

Filtered Stream

Output StreamTuple Tuple

Filtered Stream

Enriched Stream

Enriched

Stream

er

Operator

er

Operator

er

Operator

er

Operator

Page 6: Apache Apex Meetup at Cask

© 2015 DataTorrent6

Application Specification

Page 7: Apache Apex Meetup at Cask

© 2015 DataTorrent7

Partitioning and Scaling Out

• Operators can be dynamically scaled• Flexible Streams split• Parallel partitioning

• MxN partitioning • Unifiers

Page 8: Apache Apex Meetup at Cask

© 2015 DataTorrent8

Advanced Windowing Support

Application window Sliding window and tumbling window

Checkpoint window No artificial latency

Page 9: Apache Apex Meetup at Cask

© 2015 DataTorrent9

Platform FeaturesStateful Fault Tolerance Processing Semantics Data Locality

Supported out of the box– Application state– Application master state– No data loss

Automatic recovery Lunch test Buffer server

At least once At most once Exactly once

Stream locality for placement of operators

Rack local – Distributed deployment

Node local – Data does not traverse NIC

Container local – Data doesn’t need to be serialized

Thread local – Operators run in same thread

Data locality

Page 10: Apache Apex Meetup at Cask

© 2015 DataTorrent10

Dynamic Updates Dynamic topology updates

– Properties of operators can be changed– New operators can be added

Page 11: Apache Apex Meetup at Cask

© 2015 DataTorrent11

Data Processing Pipeline ExampleApp Builder

Page 12: Apache Apex Meetup at Cask

© 2015 DataTorrent12

Data Processing Pipeline ExampleLogical Plan

Page 13: Apache Apex Meetup at Cask

© 2015 DataTorrent13

Data Processing Pipeline ExamplePhysical Plan

Page 14: Apache Apex Meetup at Cask

© 2015 DataTorrent14

Data Processing Pipeline ExampleReal Time Visualization

Page 15: Apache Apex Meetup at Cask

© 2015 DataTorrent15

ResourcesApache Apex Community Page

Apache Apex LinkedIn Group

Page 16: Apache Apex Meetup at Cask

© 2015 DataTorrent

Resources

16

• Apache Apex - http://apex.apache.org/• Subscribe - http://apex.apache.org/community.html• Download - https://www.datatorrent.com/download/• Twitter

ᵒ @ApacheApex; Follow - https://twitter.com/apacheapexᵒ @DataTorrent; Follow – https://twitter.com/datatorrent

• Meetups - http://www.meetup.com/topics/apache-apex• Webinars - https://www.datatorrent.com/webinars/• Videos - https://www.youtube.com/user/DataTorrent• Slides - http://www.slideshare.net/DataTorrent/presentations • Startup Accelerator Program - Full featured enterprise product

ᵒ https://www.datatorrent.com/product/startup-accelerator/

Page 17: Apache Apex Meetup at Cask

© 2015 DataTorrent

We Are Hiring

17

[email protected]• Developers/Architects• QA Automation Developers• Information Developers• Build and Release• Community Leaders

Page 18: Apache Apex Meetup at Cask

End

18