nyc storm meetup_robdoherty

19
Storm at Rob Doherty Senior Backend Engineer [email protected] @robdoherty2

Upload: robert-doherty

Post on 06-May-2015

1.019 views

Category:

Technology


0 download

DESCRIPTION

Presentation at NYC Storm Meetup #1 on the Kafka-Storm implementation used in production at Outbrain Engage to track thousands of web traffic pings per second.

TRANSCRIPT

Page 1: Nyc storm meetup_robdoherty

Storm at

Rob DohertySenior Backend [email protected]@robdoherty2

Page 2: Nyc storm meetup_robdoherty

What is Outbrain?

Page 3: Nyc storm meetup_robdoherty

Before Storm

● Custom distributed processing system● Python and ZMQ

● Advantages:○ Simple components○ Well-understood

● Disadvantages:○ Did not scale○ Batch-processing

Page 4: Nyc storm meetup_robdoherty

Kafka + Storm

● Kafka: high-throughput distributed messaging● Storm: distributed, real-time computation

Page 5: Nyc storm meetup_robdoherty

Why Kafka?

● Need method to buffer clicks into “stream”● Kafka + Storm common pattern for click tracking

Page 6: Nyc storm meetup_robdoherty

Why Storm?

● “Real time” (15s latency requirements)● Fault tolerance● Easy to manage parallelism● Stream grouping● Active community● Open-source project

Page 7: Nyc storm meetup_robdoherty

Nginx Servers

Kafka Cluster

Storm Topology

Elastic Load Balancer

Customer Traffic

AWS

MongoDB

Redis

Algo

API

Architecture

Page 8: Nyc storm meetup_robdoherty

Kafka Cluster

● 40 Producers (8 m1.large instances)○ Python brod

● 4 Brokers (4 m1.large instances)

10k Clicks per second (peak)

14B Clicks per month

Kafka v0.7.2

Page 9: Nyc storm meetup_robdoherty

Storm Topology

● 40 Supervisors (c1.xlarge instances)● 35 Bolts, 1 Kafka spout● 250+ Executors (worker threads)

160k+ tuples executed per second

Storm v0.82

Leiningen v1.7

Page 10: Nyc storm meetup_robdoherty

Customer Traffic

Kafka Spout Aggregate 15s

Aggregate 5m

Position

Customer

Social

Arrangement

Front Page

@Handle

Storm Topology

Page 11: Nyc storm meetup_robdoherty

Challenges

● Shell Bolts● Anchor Bolts/Replaying Stream● Acking Tuples● Monitoring

Page 12: Nyc storm meetup_robdoherty

Monitoring

● Scribe Logging● Munin + Nagios● JMX-JMXTrans + Ganglia

● Storm UI● Thrift interface into Nimbus + D3

Page 13: Nyc storm meetup_robdoherty
Page 14: Nyc storm meetup_robdoherty
Page 15: Nyc storm meetup_robdoherty

Monitoring

● Scribe Logging● Munin + Nagios● JMX-JMXTrans + Ganglia

● Storm UI● Thrift interface into Nimbus + D3

Page 16: Nyc storm meetup_robdoherty
Page 17: Nyc storm meetup_robdoherty
Page 18: Nyc storm meetup_robdoherty

Future Plans

● Load testing● Break topology into smaller pieces● Move from AWS to private data center

Page 19: Nyc storm meetup_robdoherty

Thank you

Rob DohertySenior Backend [email protected]@robdoherty2