Transcript
Page 1: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm

COMPLEX EVENT PROCESSING W/

CASSANDRA

Brian O’NeillLead Architect, Health Market [email protected]@boneill42

Taylor GoetzDevelopment Lead, Health Market [email protected]@ptgoetz

Page 2: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm

Use CaseWhat is CEP? Why? What for?

Storm BackgroundCluster configuration

Examples / Demo Future : Trident

Agenda

Page 3: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm

Our Products

Master Data ManagementGood, bad doctors?

Prescriber eligibility and remediation.

Page 4: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm

Cassandra to the Rescue

1000’s of Feeds

Δt

C* Masterfile

Big Data for us == Variety of Data

Page 5: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm

But…

Search unstructured data Real-time Analytics / Reporting Transactional Processing

Changes reflected immediately.Wide-row Indexes

Page 6: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm

What might that look like?

C*

RDBMS

I’m happy

Dro

pwiz

ard

wid

e-ro

w in

dex

Provide for Polyglot Persistence

Page 7: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm

What we did wrong… (part 1)

Could not react to transactional changes Needed extra logic to track what changed Took too long

Page 8: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm

What we did wrong… (part 2)

Good IntentionsTake the onus off the clients

Bad Result Guaranteeing executionWrite Overhead

C*Wide Row….Indexing

TRIGGERS

Page 9: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm

What is Complex Event Processing? Event processing is a method of tracking and

analyzing (processing) streams of information (data) about things that happen (events),[1] and deriving a conclusion from them.

Complex event processing, or CEP, is event processing that combines data from multiple sources[2] to infer events or patterns that suggest more complicated circumstances.

http://en.wikipedia.org/wiki/Complex_event_processing

Page 10: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm

Imagine…

Treating CRUD operations as events in a system.

Then Suddenly,

CEP = (ETL or Analytics)

Page 11: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm

What Storm is to us…

CrudOp

A High Throughput Data Processing Pipeline

RDBMSSoR

Dimensional Counts

ETL Enrichment

Fuzzy Index

Page 12: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm

Enter Storm

Page 13: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm

Storm Overview

Open-Sourced by Twitter in 2011 Distributed Realtime Computation System Fault Tolerant Highly Scalable Guaranteed Processing Operates on one or more streams of data (i.e.

CEP)

Page 14: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm

Anatomy of a Storm Cluster

NimbusMaster Node

ZookeeperCluster Coordination

SupervisorsWorker Nodes

Page 15: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm

Storm Components

SpoutsStream Sources

BoltsUnit of Computation

TopologiesCombination of n Spouts and n BoltsDefines the overall “Computation”

Page 16: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm

Storm Spouts

Represents a source (stream) of dataQueues (JMS, Kafka, Kestrel, etc.)Twitter FirehoseSensor Data

Emits “Tuples” (Events) based on sourcePrimary Storm data structureSet of Key-Value pairs

Page 17: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm

Storm Bolts

Receive Tuples from Spouts or other Bolts Operate on, or React to Data

Functions/Filters/Joins/AggregationsDatabase writes/lookups

Optionally emit additional Tuples

Page 18: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm

Storm Topologies

Data flow between spouts and bolts Routing of Tuples between spouts/bolts

Stream “Groupings” Parallelism of Components Long-Lived

Page 19: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm

Storm and Cassandra

Use Cases:Write Storm Tuple data to C*

○ Computation Results○ Pre-computed indices

Read data from C* and emit Storm Tuples○ Dynamic Lookups

http://github.com/hmsonline/storm-cassandra

Page 20: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm

Storm Cassandra Bolt Types

CassandraBolt

C*CassandraLookupBolt

STORM

CassandraBoltWrites data to CassandraAvailable in Batching and Non-Batching

CassandraLookupBoltReads data from Cassandra

http://github.com/hmsonline/storm-cassandra

Page 21: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm

Storm-Cassandra Project

Provides generic Bolts for writing/reading Storm Tuples to/from C*

TupleTuple

Mapper Rows

C*TuplesColumnsMapper Columns

STORM

http://github.com/hmsonline/storm-cassandra

Page 22: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm

Storm-Cassandra Project

TupleMapper InterfaceTells the CassandraBolt how to write a tuple to an

arbitrary data model

Given a Storm Tuple:Map to Column FamilyMap to Row KeyMap to Columns

http://github.com/hmsonline/storm-cassandra

Page 23: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm

Storm-Cassandra Project

ColumnsMapper InterfaceTells the CassandraLookupBolt how to transform a

C* row into a Storm Tuple

Given a C* Row Key and list of Columns:Return a list of Storm Tuples

http://github.com/hmsonline/storm-cassandra

Page 24: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm

Storm-Cassandra Project Current State:

Version 0.4.0-WIPUses Astyanax ClientSeveral out-of-the-box *Mapper Implementations:

○ Basic Key-Value Columns○ Value-less Columns○ Counter Columns○ Lookup by row key○ Lookup by range query

Initial pass at Trident supportInitial pass at Composite Column Support

http://github.com/hmsonline/storm-cassandra

Page 25: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm

Storm-Cassandra Project

Future Plans:Switch to CQL (???)Full Trident Support

http://github.com/hmsonline/storm-cassandra

Page 26: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm

Word Count Demo

http://github.com/hmsonline/storm-cassandra

Page 27: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm

DRPC

Page 28: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm

Reach Demo

Page 29: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm

Next Level : Trident

Page 30: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm

Trident

Provides a higher-level abstraction for stream processingConstructs for state management and Batching

Adds additional primitives that abstract away common topological patterns

Deprecates transactional topologies Distributes with Storm

Page 31: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm

Sample Trident Operations Partition Local

Functions ( execute(x) x + y )Filters ( isKeep(x) 0,x )PartitionAggregate

○ Combiner ( pairwise combining )○ Reducer ( iterative accumulation )○ Aggregator ( byoa )

Page 32: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm

A sample topologyTridentTopology topology = new TridentTopology(); TridentState wordCounts = topology.newStream("spout1", spout) .each(new Fields("sentence"),

new Split(), new Fields("word"))

.groupBy(new Fields("word")) .persistentAggregate(

MemcachedState.opaque(serverLocations), new Count(), new Fields("count"))

.parallelismHint(6);

https://github.com/nathanmarz/storm/wiki/Trident-state

Page 33: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm

Trident StateSequenced writes by batch/transaction id. Spouts

Transactional○ Batch contents never change

Opaque○ Batch contents can change

StateTransactional

○ Store tx_id with counts to maintain sequencing of writes.Opaque

○ Store previous value in order to overwrite the current value when contents of a batch change.

Page 34: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm

Shameless Shoutouts

HMS (https://github.com/hmsonline/)storm-cassandrastorm-elastic-searchstorm-jdbi (coming soon)

ptgoetz (https://github.com/ptgoetz) storm-jmsstorm-signals

Page 35: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm

Brian O’NeillLead Architect, Health Market [email protected]@boneill42

Taylor GoetzDevelopment Lead, Health Market [email protected]@ptgoetz

THANKS!


Top Related