nyc storm meetup_robdoherty

Post on 06-May-2015

1.019 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Presentation at NYC Storm Meetup #1 on the Kafka-Storm implementation used in production at Outbrain Engage to track thousands of web traffic pings per second.

TRANSCRIPT

Storm at

Rob DohertySenior Backend Engineerrdoherty@outbrain.com@robdoherty2

What is Outbrain?

Before Storm

● Custom distributed processing system● Python and ZMQ

● Advantages:○ Simple components○ Well-understood

● Disadvantages:○ Did not scale○ Batch-processing

Kafka + Storm

● Kafka: high-throughput distributed messaging● Storm: distributed, real-time computation

Why Kafka?

● Need method to buffer clicks into “stream”● Kafka + Storm common pattern for click tracking

Why Storm?

● “Real time” (15s latency requirements)● Fault tolerance● Easy to manage parallelism● Stream grouping● Active community● Open-source project

Nginx Servers

Kafka Cluster

Storm Topology

Elastic Load Balancer

Customer Traffic

AWS

MongoDB

Redis

Algo

API

Architecture

Kafka Cluster

● 40 Producers (8 m1.large instances)○ Python brod

● 4 Brokers (4 m1.large instances)

10k Clicks per second (peak)

14B Clicks per month

Kafka v0.7.2

Storm Topology

● 40 Supervisors (c1.xlarge instances)● 35 Bolts, 1 Kafka spout● 250+ Executors (worker threads)

160k+ tuples executed per second

Storm v0.82

Leiningen v1.7

Customer Traffic

Kafka Spout Aggregate 15s

Aggregate 5m

Position

Customer

Social

Arrangement

Front Page

@Handle

Storm Topology

Challenges

● Shell Bolts● Anchor Bolts/Replaying Stream● Acking Tuples● Monitoring

Monitoring

● Scribe Logging● Munin + Nagios● JMX-JMXTrans + Ganglia

● Storm UI● Thrift interface into Nimbus + D3

Monitoring

● Scribe Logging● Munin + Nagios● JMX-JMXTrans + Ganglia

● Storm UI● Thrift interface into Nimbus + D3

Future Plans

● Load testing● Break topology into smaller pieces● Move from AWS to private data center

Thank you

Rob DohertySenior Backend Engineerrdoherty@outbrain.com@robdoherty2

top related