abdw17-lightning talks track-end to end exactly once processing with apache apex

13
Apex Big Data World April 4, 2017 End to End Exactly Once Processing with Apache Apex

Upload: datatorrent

Post on 12-Apr-2017

8 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: ABDW17-Lightning Talks track-End to End Exactly Once Processing with Apache Apex

Apex Big Data World

April 4, 2017

End to End Exactly Once Processing with Apache Apex

Page 2: ABDW17-Lightning Talks track-End to End Exactly Once Processing with Apache Apex

About Me

Vlad RozovApache Apex [email protected]://github.com/vrozovLead Engineer at [email protected] fan

Page 3: ABDW17-Lightning Talks track-End to End Exactly Once Processing with Apache Apex

• In-memory, distributed stream processing• Application logic broken into components (operators) that execute distributed in a cluster• Unobtrusive Java API to express (custom) logic• Maintain state and metrics in member variables• Windowing, event-time processing

• Scalable, high throughput, low latency• Operators can be scaled up or down at runtime according to the load and SLA• Dynamic scaling (elasticity), compute locality

• Fault tolerance & correctness• Automatically recover from node outages without having to reprocess from beginning • State is preserved, checkpointing, incremental recovery• End-to-end exactly-once

• Operability• System and application metrics, record/visualize data• Dynamic changes and resource allocation, elasticity

Apache Apex Platform Features

Page 4: ABDW17-Lightning Talks track-End to End Exactly Once Processing with Apache Apex

• Operator state is checkpointed to persistent store• Automatically performed by engine, no additional coding needed• Asynchronous and distributed • In case of failure operators are restarted from checkpoint state

• Automatic detection and recovery of failed containers• Heartbeat mechanism• YARN process status notification

• Buffering to enable replay of data from recovered point• Fast, incremental recovery, spike handling

• Application master state checkpointed• Snapshot of physical (and logical) plan• Execution layer change log

Fault Tolerance

Page 5: ABDW17-Lightning Talks track-End to End Exactly Once Processing with Apache Apex

• Save state of operator so that it can be recovered on failure• Pluggable storage handler• Default implementation

• Serialization with Kryo• All non-transient fields serialized• Serialized state written to HDFS• Writes asynchronous, non-blocking

• Possible to implement custom handlers for alternative approach to extract state or different storage backend (such as IMDG)

• For operators that do not rely on previous state for computation operators can be marked @Stateless to skip checkpointing

• Checkpoint frequency tunable (by default 30s)• Based on streaming windows for consistent state

Checkpointing Operator State

Page 6: ABDW17-Lightning Talks track-End to End Exactly Once Processing with Apache Apex

… EW2, 1, 3, BW2, EW1, 4, 2, 1, BW1sum

0

… EW2, 1, 3, BW2, EW1, 4, 2, 1, BW1sum

7

… EW2, 1, 3, BW2, EW1, 4, 2, 1, BW1

… EW2, 1, 3, BW2, EW1, 4, 2, 1, BW1sum

7

sum10

Sample Failure Scenario

Page 7: ABDW17-Lightning Talks track-End to End Exactly Once Processing with Apache Apex

At most once At least once Exactly once

Subscribes to data from the start of the next window.

Operator brought back to its latest checkpointed state and the upstream buffer server replays all subsequent windows

Operator brought back to its latest checkpointed state and the upstream buffer server replays all subsequent windows

Ignore the lost windows and continue to process incoming data normally.

Lost windows are recomputed & application catches up live incoming data

Lost windows are recomputed in a logical way to have the effect as if computation has been done exactly once.

No duplicates & no recomputation

Likely duplicates & recomputation

No duplicates & recomputation?

Possible missing data No lost data No lost data

Processing Guarantees

Page 8: ABDW17-Lightning Talks track-End to End Exactly Once Processing with Apache Apex

• Becomes important when writing to external systems• Data should not be duplicated or lost in the external system even in

case of application failures• Common external systems

• Databases• Files• Message queues

• Exactly-once = at-least-once + idempotency + consistent state• Platform support for at least once is a must so that no data is lost• Data duplication must still be avoided when data is replayed from checkpoint• Operators implement the logic dependent on the external system

• Aid of platform features such as stateful checkpointing and windowing

End-to-End Exactly Once

Page 9: ABDW17-Lightning Talks track-End to End Exactly Once Processing with Apache Apex

• Streaming data is being written to file on a continuous basis• Failure at a random point results in file with an unknown

amount of data• Operator works with platform to ensure exactly once• Platform responsibility

• Restores state and restarts operator from an earlier checkpoint• Platform replays data from the exact point after checkpoint

• Operator responsibility• Replayed data doesn’t get duplicated in the file• Accomplishes by keeping track of file offset as state

• Existing implementation from Malhar Library AbstractFileOutputOperator.java

Files

Page 10: ABDW17-Lightning Talks track-End to End Exactly Once Processing with Apache Apex

File Data

Offset

Ensures no data is duplicated or lost

ChkOperator saves file offset during checkpointFile contents are flushed before checkpoint to ensure there is no pending data in bufferOn recovery platform restores the file offset value from checkpointOperator truncates the file to the offsetStarts writing data again

Files - Exactly Once Strategy

Page 11: ABDW17-Lightning Talks track-End to End Exactly Once Processing with Apache Apex

Questions?

Page 12: ABDW17-Lightning Talks track-End to End Exactly Once Processing with Apache Apex

• Apache Apex - http://apex.apache.org/• Subscribe to forums

ᵒ Apex - http://apex.apache.org/community.htmlᵒ DataTorrent - https://groups.google.com/forum/#!forum/dt-users

• Download - https://datatorrent.com/download/• Twitter

ᵒ @ApacheApex; Follow - https://twitter.com/apacheapexᵒ @DataTorrent; Follow – https://twitter.com/datatorrent

• Meetups - http://meetup.com/topics/apache-apex• Webinars - https://datatorrent.com/webinars/• Videos - https://youtube.com/user/DataTorrent• Slides - http://slideshare.net/DataTorrent/presentations • Startup Accelerator – Free full featured enterprise product

ᵒ https://datatorrent.com/product/startup-accelerator/• Big Data Application Templates Hub – https://datatorrent.com/apphub

Resources

Page 13: ABDW17-Lightning Talks track-End to End Exactly Once Processing with Apache Apex

[email protected]•Developers/Architects•QA Automation Developers• Information Developers•Build and Release•Community Leaders

We are hiring!