apache apex & apace geode in-memory computation, storage & analysis

16
Apache Apex + Apache Geode In-Memory Streaming, Storage & Analytics Ashish Tadose

Upload: apache-apex

Post on 18-Jan-2017

229 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Apache Apex & Apace Geode In-Memory Computation, Storage & Analysis

Apache Apex + Apache GeodeIn-Memory Streaming, Storage & Analytics

Ashish Tadose

Page 2: Apache Apex & Apace Geode In-Memory Computation, Storage & Analysis

Streaming meets In Memory Data Grid

Page 3: Apache Apex & Apace Geode In-Memory Computation, Storage & Analysis

Apache Geode: Listeners• CacheWriter / CacheListener• AsyncEventListener (queue / batch)

• Parallel or Serial• Conflation

3

Page 4: Apache Apex & Apace Geode In-Memory Computation, Storage & Analysis

Apache Geode: Events & NotificationsRegister Interest•Individual Keys OR RegEx for Keys•Updates Local Copy•Examples:

• region.registerInterest(“key-1”);• region1.registerInterestRegex(“[a-z]+“);

Continuous Query•Receive Notification when Query condition met on server•Example:– SELECT * FROM /tradeOrder t WHERE t.price > 100.00

Can be DURABLE

4

Page 5: Apache Apex & Apace Geode In-Memory Computation, Storage & Analysis

Apex: Checkpointing Today

er

Operator

er

Operator

er

Operator

Filtered

Stream

Filtered

Stream

er

OperatorInput

Stream

Enriched

Stream

Enriched

Stream

Output

Stream

Checkpoint

State

Checkpoint

State

Checkpoint

State

Persistence

In-Memory

Page 6: Apache Apex & Apace Geode In-Memory Computation, Storage & Analysis

Apex: Checkpointing with Geode

er

Operator

er

Operator

er

Operator

Filtered

Stream

Filtered

Stream

er

OperatorInput

Stream

Enriched

Stream

Enriched

Stream

Output

Stream

Checkpoint

State

Checkpoint

State

Checkpoint

State

Persistence

In-Memory

Page 7: Apache Apex & Apace Geode In-Memory Computation, Storage & Analysis

Operator Checkpointing in Geode

Apex Operator check-pointing in an IMDG (Geode store)

•Checkpointing is an essential mechanism to ensure Fault Tolerance•Apex checkpoints operator state to HDFS•Slower HDFS checkpointing hurts application performance•Checkpointing in Geode ensures that application performance is not impacted •Geode has better latency for write operations than HDFS.

Implementation: GeodeStorageAgent

https://issues.apache.org/jira/browse/APEXCORE-283

Page 8: Apache Apex & Apace Geode In-Memory Computation, Storage & Analysis

Apex: Input/Output Operators

er

Operatorer

OutputOperator

Input

StreamOutput

Stream

Checkpoint

State

Checkpoint

State

Data Store

In-Memory

No SQL Database

Page 9: Apache Apex & Apace Geode In-Memory Computation, Storage & Analysis

er

Operatorer

Geode Output

Input

StreamOutput

Stream

Checkpoint

State

Checkpoint

State

Data Store

In-Memory

Geode Output Operator

• Built-in OQL support

• Visualization support

• Persistence options

• Transaction support

Page 10: Apache Apex & Apace Geode In-Memory Computation, Storage & Analysis

Data Streams to Geode Store

Page 11: Apache Apex & Apace Geode In-Memory Computation, Storage & Analysis

Apex + Geode: Future Integrations

• Geode output operator with transactional support

• Input Operator: Ingest data from Geode to Apex DAG

• Distributed Cache Operator

• Scan Operator: Parallel query execution & result retrieval

Page 12: Apache Apex & Apace Geode In-Memory Computation, Storage & Analysis

Geode Transaction Operator

Apex Output Operator to write to Geode store with Transactions

•Apex DAG uses TransactionableStore to provide guarantee that records are written are exactly once. E.g. JdbcTransactionalStore

•Geode provides transaction support for efficient and safe coordinated operations•Geode store using transactions guarantee that records are written exactly once•Put operator backed by GeodeTransactional store can help to achieve Exactly once semantics

Implementation: GeodeWindowStore as TransactionableStore

Proposed

Page 13: Apache Apex & Apace Geode In-Memory Computation, Storage & Analysis

Input Operator: Streaming Geode data

Apex Input Operator to read from Geode store

•Apex Input operators – Ingest data from external sources into Apex DAG

•Geode provides versatile and reliable event distribution to provide Real Time updates to data• Use case – Apex operator to stream async events from Geode in DAG• Call back events reduce polling cycles over network

Implementation: GeodeRegionStreamOperator receives a newly added tuples and emits in DAG

Proposed

Page 14: Apache Apex & Apace Geode In-Memory Computation, Storage & Analysis

Geode Cache Operator

Apex+Geode Cache Operator

•Geode provides efficient Events & Notifications • Register interest – update local copies • Continuous Query

• Receive notification when Query condition met on server• Eg.g SELECT * FROM /tradeOrder t WHERE t.price > 100.00

•Use Geode events notification framework to maintain & invalidate cache.

Implementation: GeodeCacheOperatormaintains consistent cache based on subscribed keyset/query

Proposed

Page 15: Apache Apex & Apace Geode In-Memory Computation, Storage & Analysis

Geode Scan Operator

Apex+Geode Scan Operator

•Function Execution provides Parallel Query Execution•MapReduce like execution - concurrent execution on members & results are collected from members & sent to caller. •Use case: Streaming application depending on large scan result from external store

Implementation: GeodeQueryOperator execute data dependent queries on distributed regionemit results in DAG

Proposed

Page 16: Apache Apex & Apace Geode In-Memory Computation, Storage & Analysis

Questions ???

Thank You …