![Page 1: Flurry Analytic Backend - Processing Terabytes of Data in Real-time](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a9a2760da3da068b70b0/html5/thumbnails/1.jpg)
www.flurry.com
November 14, 2013
Anthony Watkins, Senior Director of Developer Relations
Processing Terabytes of Data in Real-
Time
@flurrymobile
@antwatkins
![Page 2: Flurry Analytic Backend - Processing Terabytes of Data in Real-time](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a9a2760da3da068b70b0/html5/thumbnails/2.jpg)
www.flurry.com
![Page 3: Flurry Analytic Backend - Processing Terabytes of Data in Real-time](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a9a2760da3da068b70b0/html5/thumbnails/3.jpg)
Flurry is a leading mobile advertising and analytics provider
Pub
lishe
r
Adv
ertis
er
Audience
AppCircle Applications: 10,000+
Devices/month: 300M
Conversions/month: 120M
AppSpot Applications: 2,500+
Devices/month: 250M
Impressions/month: 7.5B
Analytics Applications: 400,000
Devices/month: 1.2B
Data points/month: 1.9T
![Page 4: Flurry Analytic Backend - Processing Terabytes of Data in Real-time](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a9a2760da3da068b70b0/html5/thumbnails/4.jpg)
• Why Flurry Switched from a MapReduce Framework to
pipeline processing
• How Flurry uses Kafka in data processing
• Tuning of Kafka to work in Flurry’s environment
• Flurry Monitoring and error handling of streams
Topics
The Path to Real-Time Processing
www.flurry.com 4
![Page 5: Flurry Analytic Backend - Processing Terabytes of Data in Real-time](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a9a2760da3da068b70b0/html5/thumbnails/5.jpg)
The Why
www.flurry.com 5
![Page 6: Flurry Analytic Backend - Processing Terabytes of Data in Real-time](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a9a2760da3da068b70b0/html5/thumbnails/6.jpg)
Past Processing Model
www.flurry.com 6
Device Reports
NoSQL DataStore
Batch
Collectors
MapReduce
(jobs)
External
Action
![Page 7: Flurry Analytic Backend - Processing Terabytes of Data in Real-time](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a9a2760da3da068b70b0/html5/thumbnails/7.jpg)
Flurry Analytics MapReduce Architecture
www.flurry.com 7
Agent Portal Data Log Processor
Developer
Portal Metrics Computer
HDFS
HBase
HBase
Hadoop/Hbase
Jetty
Jetty
HTTP
Binary Encoded
Data
Raw Data
Log Archive
Metrics Table
(Cube)
Normalized
Data Storage
User Profile
Data
MySQL
Hadoop Map/Reduce
Hadoop Map/Reduce
Web Layer Metrics Processing
![Page 8: Flurry Analytic Backend - Processing Terabytes of Data in Real-time](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a9a2760da3da068b70b0/html5/thumbnails/8.jpg)
Data Collection and Processing in MR
Pros
www.flurry.com 8
MapReduce
(jobs)
![Page 9: Flurry Analytic Backend - Processing Terabytes of Data in Real-time](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a9a2760da3da068b70b0/html5/thumbnails/9.jpg)
Data Collection and Processing in MR
Cons
www.flurry.com 9
Device Reports
MapReduce
(jobs)
Job Time
Startup Time
![Page 10: Flurry Analytic Backend - Processing Terabytes of Data in Real-time](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a9a2760da3da068b70b0/html5/thumbnails/10.jpg)
Flurry Kafka
The Move to Kafka
www.flurry.com 10
![Page 11: Flurry Analytic Backend - Processing Terabytes of Data in Real-time](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a9a2760da3da068b70b0/html5/thumbnails/11.jpg)
About Kafka
Origin
www.flurry.com 11
November 2010 June 2011 November 2012
![Page 12: Flurry Analytic Backend - Processing Terabytes of Data in Real-time](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a9a2760da3da068b70b0/html5/thumbnails/12.jpg)
About Kafka
www.flurry.com 12
Producer Producer Producer
Kakfa Broker
Consumer Consumer Consumer
![Page 13: Flurry Analytic Backend - Processing Terabytes of Data in Real-time](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a9a2760da3da068b70b0/html5/thumbnails/13.jpg)
About Kafka
www.flurry.com 13
Kafka Broker
*
* Partition image courtesy of http://kafka.apache.org/images/log_anatomy.png
![Page 14: Flurry Analytic Backend - Processing Terabytes of Data in Real-time](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a9a2760da3da068b70b0/html5/thumbnails/14.jpg)
About Kafka
www.flurry.com 14
Producer 1 Producer N Producer 2
Kafka Cluster
Broker 1
P0 P2
Broker 2
P1 P3
Consumer Group
C1 C2 C3
![Page 15: Flurry Analytic Backend - Processing Terabytes of Data in Real-time](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a9a2760da3da068b70b0/html5/thumbnails/15.jpg)
Why Kafka for Flurry
www.flurry.com 15
Device Reports
MapReduce
(jobs) Kafka
Startup
Time
![Page 16: Flurry Analytic Backend - Processing Terabytes of Data in Real-time](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a9a2760da3da068b70b0/html5/thumbnails/16.jpg)
Introducing the Data Log Consumer (DLC)
www.flurry.com 16
Agent Portal Data Log Consumer
Developer
Portal Metrics Computer
HDFS
HBase
HBase
Hadoop/Hbase
Jetty
Jetty
HTTP
Binary Encoded
Data
Metrics Table
(Cube)
Normalized
Data Storage
User Profile
Data
MySQL
Kafka
Hadoop Map/Reduce
Web Layer Metrics Processing
![Page 17: Flurry Analytic Backend - Processing Terabytes of Data in Real-time](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a9a2760da3da068b70b0/html5/thumbnails/17.jpg)
• Zookeeper timeouts
• Completely async service
• Default fsync interval
• Commit threshold from local environments
Tuning Kafka for Flurry
Challenges
www.flurry.com 17
![Page 18: Flurry Analytic Backend - Processing Terabytes of Data in Real-time](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a9a2760da3da068b70b0/html5/thumbnails/18.jpg)
How Flurry Uses Kafka
Infrastructure and Setup
www.flurry.com 18
Consumer Group
C1 C2 C… C325
Kafka Cluster
B1 B2 B3
Broker
P1 P2 P… P400
Topic
![Page 19: Flurry Analytic Backend - Processing Terabytes of Data in Real-time](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a9a2760da3da068b70b0/html5/thumbnails/19.jpg)
Flurry Monitoring / Error Handling
Monitoring
www.flurry.com 19
• Alerts
• Consumer Failure
• Broker Failure
Error Handling
![Page 20: Flurry Analytic Backend - Processing Terabytes of Data in Real-time](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a9a2760da3da068b70b0/html5/thumbnails/20.jpg)
Next Steps: 0.8
www.flurry.com 20
Data Log Consumer
HDFS
Kafka
Data Log Consumer
Kafka
Kafka Cluster
Broker 1
P0 P2
Broker 2
P1 P3
P1’ P3’ P0’ P2’
![Page 21: Flurry Analytic Backend - Processing Terabytes of Data in Real-time](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a9a2760da3da068b70b0/html5/thumbnails/21.jpg)
Next Steps: Extended Pipeline
www.flurry.com 21
Input Data
NoSQL DataStore
Real-Time Batch
Collectors
Consumer/
Producer
Systems
MapReduce
(jobs)
External
Action External
Action
![Page 22: Flurry Analytic Backend - Processing Terabytes of Data in Real-time](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a9a2760da3da068b70b0/html5/thumbnails/22.jpg)
Next Steps: Topics and Consumer Groups
Infrastructure and Setup
www.flurry.com 22
Consumer Group 2
C1’ C2’ C… CN’
Topic 1
Consumer Group 1
C1 C2 C… CN
Consumer Group N
C1’’ C2’’ C… CN’’
Topic 2
![Page 23: Flurry Analytic Backend - Processing Terabytes of Data in Real-time](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a9a2760da3da068b70b0/html5/thumbnails/23.jpg)
www.flurry.com
November 14, 2013
blog.flurry.com
@flurrymobile
@antwatkins
Thank you