apache apex meetup at cask

Post on 12-Jan-2017

95 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Thomas Weise <thomas@datatorrent.com>Dec 2nd, 2015

Introduction to Open Source Unified Streaming and Fast Batching PlatformApache Apex

© 2015 DataTorrent2

Apex Platform Overview

© 2015 DataTorrent3

Apache Malhar Library

© 2015 DataTorrent4

Native Hadoop Integration

• YARN is the resource manager

• HDFS used for storing any persistent state

© 2015 DataTorrent5

Application Programming Model

A Stream is a sequence of data tuplesAn Operator takes one or more input streams, performs computations & emits one or more output streams

• Each Operator is YOUR custom business logic in java, or built-in operator from our open source library• Operator has many instances that run in parallel and each instance in single-threaded

Directed Acyclic Graph (DAG) is made up of operations and streams

Directed Acyclic Graph (DAG)

Filtered Stream

Output StreamTuple Tuple

Filtered Stream

Enriched Stream

Enriched

Stream

er

Operator

er

Operator

er

Operator

er

Operator

© 2015 DataTorrent6

Application Specification

© 2015 DataTorrent7

Partitioning and Scaling Out

• Operators can be dynamically scaled• Flexible Streams split• Parallel partitioning

• MxN partitioning • Unifiers

© 2015 DataTorrent8

Advanced Windowing Support

Application window Sliding window and tumbling window

Checkpoint window No artificial latency

© 2015 DataTorrent9

Platform FeaturesStateful Fault Tolerance Processing Semantics Data Locality

Supported out of the box– Application state– Application master state– No data loss

Automatic recovery Lunch test Buffer server

At least once At most once Exactly once

Stream locality for placement of operators

Rack local – Distributed deployment

Node local – Data does not traverse NIC

Container local – Data doesn’t need to be serialized

Thread local – Operators run in same thread

Data locality

© 2015 DataTorrent10

Dynamic Updates Dynamic topology updates

– Properties of operators can be changed– New operators can be added

© 2015 DataTorrent11

Data Processing Pipeline ExampleApp Builder

© 2015 DataTorrent12

Data Processing Pipeline ExampleLogical Plan

© 2015 DataTorrent13

Data Processing Pipeline ExamplePhysical Plan

© 2015 DataTorrent14

Data Processing Pipeline ExampleReal Time Visualization

© 2015 DataTorrent15

ResourcesApache Apex Community Page

Apache Apex LinkedIn Group

© 2015 DataTorrent

Resources

16

• Apache Apex - http://apex.apache.org/• Subscribe - http://apex.apache.org/community.html• Download - https://www.datatorrent.com/download/• Twitter

ᵒ @ApacheApex; Follow - https://twitter.com/apacheapexᵒ @DataTorrent; Follow – https://twitter.com/datatorrent

• Meetups - http://www.meetup.com/topics/apache-apex• Webinars - https://www.datatorrent.com/webinars/• Videos - https://www.youtube.com/user/DataTorrent• Slides - http://www.slideshare.net/DataTorrent/presentations • Startup Accelerator Program - Full featured enterprise product

ᵒ https://www.datatorrent.com/product/startup-accelerator/

© 2015 DataTorrent

We Are Hiring

17

• jobs@datatorrent.com• Developers/Architects• QA Automation Developers• Information Developers• Build and Release• Community Leaders

End

18

top related