storm and cassandra

33
Storm and Cassandra Cassandra NYC Meetup 11/5/2013 Jake Luciani (@tjake)

Upload: t-jake-luciani

Post on 09-May-2015

5.944 views

Category:

Technology


0 download

DESCRIPTION

Slides from talk given at the NYC Cassandra Meetup. Discussing how Storm works and how it integrates well with Apache Cassandra. There is also a segway into a example project that uses Storm and Cassandra to implement a scalable reactive web crawler. http://github.com/tjake/stormscraper

TRANSCRIPT

Page 1: Storm and Cassandra

Storm and CassandraCassandra NYC Meetup 11/5/2013

Jake Luciani (@tjake)

Page 2: Storm and Cassandra

What is Storm?

• Distributed event processor

• Provides constructs to reliably process all events

• Simple conceptual model

• New to Apache Incubator: http://wiki.apache.org/incubator/StormProposal

Page 3: Storm and Cassandra

Storm ConceptsSpout - Collects work and submits it to be processed. Tracks success or failure of each tuple.

Bolt - Processes tuples and optionally emits more tuples.

… Tuple - A collection of data that is passed within storm.

Stream - Identifies outputs from a Spout/Bolt. Forces tuples have some declared structure.

Page 4: Storm and Cassandra

Host C

Host B

Host A

Storm TopologiesA directed graph of spouts and bolts connected via streams

Zookeeper

A-F

G-P

Q-Z

Firehose Cassandra (optional)

Page 5: Storm and Cassandra

Example Topologies

• Track the top 10 most popular links being shared in the last N minutes.

Page 6: Storm and Cassandra

Where does data end up?

• Storm supports built in RPC so client requests can effectively become a spout.

!

• Put the data into a database…

• Why Cassandra though?

Page 7: Storm and Cassandra

Why Cassandra?

• Cassandra’s Data model allows incremental modifications to rows.

• Different bolts can update different parts of a Cassandra row asynchronously.

Page 8: Storm and Cassandra

Example

Page 9: Storm and Cassandra

StormScraper!A web crawling system built on

Storm + Cassandra !

http://github.com/tjake/stormscraper

Page 10: Storm and Cassandra

StormScraper C* DataModel!CREATE TABLE pages ( url text, scrape_date timestamp, title text, html text, text text, inbound_links set<text>, outbound_links set<text>, PRIMARY KEY (url, scrape_date) );

CREATE TABLE scrape_list ( url text PRIMARY KEY, last_update timestamp, depth int );

Page 11: Storm and Cassandra

StormScraper Topology

Page 12: Storm and Cassandra

StormScraper Topology

Cassandra

Page 13: Storm and Cassandra

StormScraper Topology

Url Spout

Cassandra

Page 14: Storm and Cassandra

StormScraper Topology

Url Spout

Cassandra

Page 15: Storm and Cassandra

StormScraper Topology

Url Spout

Cassandra

Page 16: Storm and Cassandra

StormScraper Topology

Url Spout

Scraper Bolt

Cassandra

Page 17: Storm and Cassandra

StormScraper Topology

Url Spout

Scraper Bolt

Cassandra

Page 18: Storm and Cassandra

StormScraper Topology

Url Spout

Scraper Bolt

Cassandra

Page 19: Storm and Cassandra

StormScraper Topology

Url Spout

Scraper Bolt

Html Writer

Cassandra

Page 20: Storm and Cassandra

StormScraper Topology

Url Spout

Scraper Bolt

Html Writer

Link Writer

Cassandra

Page 21: Storm and Cassandra

StormScraper Topology

Url Spout

Scraper Bolt

Text Extraction

Bolt

Html Writer

Link Writer

Cassandra

Page 22: Storm and Cassandra

StormScraper Topology

Url Spout

Scraper Bolt

Text Extraction

Bolt

Html Writer

Link Writer

Text Writer

Cassandra

Page 23: Storm and Cassandra

StormScraper Topology

Url Spout

Scraper Bolt

Text Extraction

Bolt

Html Writer

Link Writer

Text Writer

Cassandra

Page 24: Storm and Cassandra

StormScraper Topology

Url Spout

Scraper Bolt

Text Extraction

Bolt

Html Writer

Link Writer

Text Writer

Cassandra

Page 25: Storm and Cassandra

StormScraper Topology

Url Spout

Scraper Bolt

Text Extraction

Bolt

Html Writer

Link Writer

Text Writer

Cassandra

Page 26: Storm and Cassandra

StormScraper Topology

Url Spout

Scraper Bolt

Text Extraction

Bolt

Html Writer

Link Writer

Text Writer

Cassandra

Page 27: Storm and Cassandra

StormScraper Topology

Url Spout

Scraper Bolt

Text Extraction

Bolt

Html Writer

Link Writer

Text Writer

Cassandra

Page 28: Storm and Cassandra

StormScraper Topology

Url Spout

Scraper Bolt

Text Extraction

Bolt

Html Writer

Link Writer

Text Writer

Cassandra

Fail

Page 29: Storm and Cassandra

StormScraper Topology

Url Spout

Scraper Bolt

Text Extraction

Bolt

Html Writer

Link Writer

Text Writer

Cassandra

Fail

Page 30: Storm and Cassandra

StormScraper Topology

Url Spout

Scraper Bolt

Text Extraction

Bolt

Html Writer

Link Writer

Text Writer

Cassandra

Fail

Page 31: Storm and Cassandra

Code Walkthrough http://github.com/tjake/

stormscraper

Page 32: Storm and Cassandra

Storm Summary

• Powerful

• But easy to make mistakes

• Wrong tuple expectation, names, types

• Bad topology wiring

Page 33: Storm and Cassandra

Thank You! Q&A?