apache apex & apace geode in-memory computation, storage & analysis

Post on 18-Jan-2017

230 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Apache Apex + Apache GeodeIn-Memory Streaming, Storage & Analytics

Ashish Tadose

Streaming meets In Memory Data Grid

Apache Geode: Listeners• CacheWriter / CacheListener• AsyncEventListener (queue / batch)

• Parallel or Serial• Conflation

3

Apache Geode: Events & NotificationsRegister Interest•Individual Keys OR RegEx for Keys•Updates Local Copy•Examples:

• region.registerInterest(“key-1”);• region1.registerInterestRegex(“[a-z]+“);

Continuous Query•Receive Notification when Query condition met on server•Example:– SELECT * FROM /tradeOrder t WHERE t.price > 100.00

Can be DURABLE

4

Apex: Checkpointing Today

er

Operator

er

Operator

er

Operator

Filtered

Stream

Filtered

Stream

er

OperatorInput

Stream

Enriched

Stream

Enriched

Stream

Output

Stream

Checkpoint

State

Checkpoint

State

Checkpoint

State

Persistence

In-Memory

Apex: Checkpointing with Geode

er

Operator

er

Operator

er

Operator

Filtered

Stream

Filtered

Stream

er

OperatorInput

Stream

Enriched

Stream

Enriched

Stream

Output

Stream

Checkpoint

State

Checkpoint

State

Checkpoint

State

Persistence

In-Memory

Operator Checkpointing in Geode

Apex Operator check-pointing in an IMDG (Geode store)

•Checkpointing is an essential mechanism to ensure Fault Tolerance•Apex checkpoints operator state to HDFS•Slower HDFS checkpointing hurts application performance•Checkpointing in Geode ensures that application performance is not impacted •Geode has better latency for write operations than HDFS.

Implementation: GeodeStorageAgent

https://issues.apache.org/jira/browse/APEXCORE-283

Apex: Input/Output Operators

er

Operatorer

OutputOperator

Input

StreamOutput

Stream

Checkpoint

State

Checkpoint

State

Data Store

In-Memory

No SQL Database

er

Operatorer

Geode Output

Input

StreamOutput

Stream

Checkpoint

State

Checkpoint

State

Data Store

In-Memory

Geode Output Operator

• Built-in OQL support

• Visualization support

• Persistence options

• Transaction support

Data Streams to Geode Store

Apex + Geode: Future Integrations

• Geode output operator with transactional support

• Input Operator: Ingest data from Geode to Apex DAG

• Distributed Cache Operator

• Scan Operator: Parallel query execution & result retrieval

Geode Transaction Operator

Apex Output Operator to write to Geode store with Transactions

•Apex DAG uses TransactionableStore to provide guarantee that records are written are exactly once. E.g. JdbcTransactionalStore

•Geode provides transaction support for efficient and safe coordinated operations•Geode store using transactions guarantee that records are written exactly once•Put operator backed by GeodeTransactional store can help to achieve Exactly once semantics

Implementation: GeodeWindowStore as TransactionableStore

Proposed

Input Operator: Streaming Geode data

Apex Input Operator to read from Geode store

•Apex Input operators – Ingest data from external sources into Apex DAG

•Geode provides versatile and reliable event distribution to provide Real Time updates to data• Use case – Apex operator to stream async events from Geode in DAG• Call back events reduce polling cycles over network

Implementation: GeodeRegionStreamOperator receives a newly added tuples and emits in DAG

Proposed

Geode Cache Operator

Apex+Geode Cache Operator

•Geode provides efficient Events & Notifications • Register interest – update local copies • Continuous Query

• Receive notification when Query condition met on server• Eg.g SELECT * FROM /tradeOrder t WHERE t.price > 100.00

•Use Geode events notification framework to maintain & invalidate cache.

Implementation: GeodeCacheOperatormaintains consistent cache based on subscribed keyset/query

Proposed

Geode Scan Operator

Apex+Geode Scan Operator

•Function Execution provides Parallel Query Execution•MapReduce like execution - concurrent execution on members & results are collected from members & sent to caller. •Use case: Streaming application depending on large scan result from external store

Implementation: GeodeQueryOperator execute data dependent queries on distributed regionemit results in DAG

Proposed

Questions ???

Thank You …

top related