kafka, the "dialtone for data": building a self-service, scalable, streaming analytics...
TRANSCRIPT
![Page 1: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra](https://reader034.vdocuments.us/reader034/viewer/2022042907/58859ba11a28abd2498b5b6d/html5/thumbnails/1.jpg)
© Copyright 2016 HomeAway, Inc.
Kafka: The “Dial Tone” for Data
![Page 2: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra](https://reader034.vdocuments.us/reader034/viewer/2022042907/58859ba11a28abd2498b5b6d/html5/thumbnails/2.jpg)
HomeAwayThe world leader for vacation
rentals
> 1 million listings(and growing!)
![Page 3: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra](https://reader034.vdocuments.us/reader034/viewer/2022042907/58859ba11a28abd2498b5b6d/html5/thumbnails/3.jpg)
Agenda
© Copyright 2016 HomeAway, Inc.
• Overview• The Problem• The Experiment• Results: Use Cases• Lessons Learned• Next Steps
![Page 4: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra](https://reader034.vdocuments.us/reader034/viewer/2022042907/58859ba11a28abd2498b5b6d/html5/thumbnails/4.jpg)
© Copyright 2016 HomeAway, Inc.
Overview
![Page 5: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra](https://reader034.vdocuments.us/reader034/viewer/2022042907/58859ba11a28abd2498b5b6d/html5/thumbnails/5.jpg)
Difference between Dinosaurs and Unicorns
© Copyright 2016 HomeAway, Inc.
![Page 6: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra](https://reader034.vdocuments.us/reader034/viewer/2022042907/58859ba11a28abd2498b5b6d/html5/thumbnails/6.jpg)
In the old days: “Dial Tone” looked like this
© Copyright 2016 HomeAway, Inc.
ATDT
![Page 7: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra](https://reader034.vdocuments.us/reader034/viewer/2022042907/58859ba11a28abd2498b5b6d/html5/thumbnails/7.jpg)
Today: Kafka is the modern “Dial Tone” for Data
© Copyright 2016 HomeAway, Inc.
Producer
Consumer
![Page 8: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra](https://reader034.vdocuments.us/reader034/viewer/2022042907/58859ba11a28abd2498b5b6d/html5/thumbnails/8.jpg)
The Problem
© Copyright 2016 HomeAway, Inc.
![Page 9: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra](https://reader034.vdocuments.us/reader034/viewer/2022042907/58859ba11a28abd2498b5b6d/html5/thumbnails/9.jpg)
The Problem
© Copyright 2016 HomeAway, Inc.
![Page 10: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra](https://reader034.vdocuments.us/reader034/viewer/2022042907/58859ba11a28abd2498b5b6d/html5/thumbnails/10.jpg)
Our original problem/motivation
© Copyright 2016 HomeAway, Inc.
search head
indexer
indexerapp server forwarder
app server forwarder
1 TB/day ingress and growing!40,000 calls/sec
![Page 11: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra](https://reader034.vdocuments.us/reader034/viewer/2022042907/58859ba11a28abd2498b5b6d/html5/thumbnails/11.jpg)
Also… Historical Analytic Pipeline was slow/expensive
© Copyright 2016 HomeAway, Inc.
app server
OLTP OLAP
analyticsETL
![Page 12: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra](https://reader034.vdocuments.us/reader034/viewer/2022042907/58859ba11a28abd2498b5b6d/html5/thumbnails/12.jpg)
Fill the Lake! Alternatives
?
Problem: Fill Hadoop!
Problem Data Lake
© Copyright 2016 HomeAway, Inc.
![Page 13: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra](https://reader034.vdocuments.us/reader034/viewer/2022042907/58859ba11a28abd2498b5b6d/html5/thumbnails/13.jpg)
What we wanted… the Big Idea
© Copyright 2016 HomeAway, Inc.
If you can log it… … you can analyze it!
![Page 14: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra](https://reader034.vdocuments.us/reader034/viewer/2022042907/58859ba11a28abd2498b5b6d/html5/thumbnails/14.jpg)
How to build self-service?
© Copyright 2016 HomeAway, Inc.
![Page 15: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra](https://reader034.vdocuments.us/reader034/viewer/2022042907/58859ba11a28abd2498b5b6d/html5/thumbnails/15.jpg)
Hypothesis: Use Kafka!
© Copyright 2016 HomeAway, Inc.
2 ms medianlatency
http://bit.ly/jay_on_logs the log
2 Million Events / Sec! (3 cheap machines)
http://goo.gl/pv5GoL “Benchmarking Apache
Kafka”
![Page 16: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra](https://reader034.vdocuments.us/reader034/viewer/2022042907/58859ba11a28abd2498b5b6d/html5/thumbnails/16.jpg)
© Copyright 2016 HomeAway, Inc.
The Experiment
![Page 17: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra](https://reader034.vdocuments.us/reader034/viewer/2022042907/58859ba11a28abd2498b5b6d/html5/thumbnails/17.jpg)
HACommonsLogging• KafkaAppender
Schema-on-read• KafkaAvroLogger
Schema-on-write
Experiment: Schema-on-Read, Schema-on-Write
Data Lake
© Copyright 2016 HomeAway, Inc.
SchemaRegistry
![Page 18: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra](https://reader034.vdocuments.us/reader034/viewer/2022042907/58859ba11a28abd2498b5b6d/html5/thumbnails/18.jpg)
Architecture: Kafka + Camus = BigData Ingress
© Copyright 2016 HomeAway, Inc.
Camus
![Page 19: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra](https://reader034.vdocuments.us/reader034/viewer/2022042907/58859ba11a28abd2498b5b6d/html5/thumbnails/19.jpg)
© Copyright 2016 HomeAway, Inc.
The Results
![Page 20: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra](https://reader034.vdocuments.us/reader034/viewer/2022042907/58859ba11a28abd2498b5b6d/html5/thumbnails/20.jpg)
Use Cases: ITOA / SLA Reporting
© Copyright 2016 HomeAway, Inc.
![Page 21: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra](https://reader034.vdocuments.us/reader034/viewer/2022042907/58859ba11a28abd2498b5b6d/html5/thumbnails/21.jpg)
Use Cases: ITOA / SLA Reporting
© Copyright 2016 HomeAway, Inc.
![Page 22: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra](https://reader034.vdocuments.us/reader034/viewer/2022042907/58859ba11a28abd2498b5b6d/html5/thumbnails/22.jpg)
Use Cases: Fraud
© Copyright 2016 HomeAway, Inc.
![Page 23: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra](https://reader034.vdocuments.us/reader034/viewer/2022042907/58859ba11a28abd2498b5b6d/html5/thumbnails/23.jpg)
Use Cases: Search + ClickStream
© Copyright 2016 HomeAway, Inc.
User Behavior
Search RequestsA/B Test
Readouts
Proctor
EDAP
![Page 24: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra](https://reader034.vdocuments.us/reader034/viewer/2022042907/58859ba11a28abd2498b5b6d/html5/thumbnails/24.jpg)
Use Cases: Search + ClickStream
© Copyright 2016 HomeAway, Inc.
![Page 25: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra](https://reader034.vdocuments.us/reader034/viewer/2022042907/58859ba11a28abd2498b5b6d/html5/thumbnails/25.jpg)
Use Cases: Traveler Segmentation
© Copyright 2016 HomeAway, Inc.
EDAP
Data Mode
l
![Page 26: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra](https://reader034.vdocuments.us/reader034/viewer/2022042907/58859ba11a28abd2498b5b6d/html5/thumbnails/26.jpg)
Lessons Learned
© Copyright 2016 HomeAway, Inc.
![Page 27: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra](https://reader034.vdocuments.us/reader034/viewer/2022042907/58859ba11a28abd2498b5b6d/html5/thumbnails/27.jpg)
Lesson #1: The Schema [registry] is Everything!
Data Lake
© Copyright 2016 HomeAway, Inc.
SchemaRegistry
• Decouples producers from consumers
• Enforces backwards compatibility
• Enables self-service / democratization
• SOT for schemas in the pipe
![Page 28: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra](https://reader034.vdocuments.us/reader034/viewer/2022042907/58859ba11a28abd2498b5b6d/html5/thumbnails/28.jpg)
Lesson #2: A Kafka/SR governance module is helpful
Data Lake
© Copyright 2016 HomeAway, Inc.
• TURN OFF Auto Topic Creation!
• Need a place for developersto request topics• Retention Policy• Expected Load• Compaction• Partition Size / Partition Key• Owner• LTS Date
![Page 29: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra](https://reader034.vdocuments.us/reader034/viewer/2022042907/58859ba11a28abd2498b5b6d/html5/thumbnails/29.jpg)
Lesson #3: Make it easy to do stream processing
© Copyright 2016 HomeAway, Inc.
SchemaRegistry
• samza-archetype• samza-job-deployer
• Will evaluate k-streams!!!!
http://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple
![Page 30: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra](https://reader034.vdocuments.us/reader034/viewer/2022042907/58859ba11a28abd2498b5b6d/html5/thumbnails/30.jpg)
Next Steps
© Copyright 2016 HomeAway, Inc.
![Page 31: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra](https://reader034.vdocuments.us/reader034/viewer/2022042907/58859ba11a28abd2498b5b6d/html5/thumbnails/31.jpg)
Consistency : 3 types of Data
© Copyright 2016 HomeAway, Inc.
Event
Document
Transactional
![Page 32: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra](https://reader034.vdocuments.us/reader034/viewer/2022042907/58859ba11a28abd2498b5b6d/html5/thumbnails/32.jpg)
Kafka Producer Spooling
© Copyright 2016 HomeAway, Inc.
![Page 33: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra](https://reader034.vdocuments.us/reader034/viewer/2022042907/58859ba11a28abd2498b5b6d/html5/thumbnails/33.jpg)
Conclusion
© Copyright 2016 HomeAway, Inc.
![Page 34: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra](https://reader034.vdocuments.us/reader034/viewer/2022042907/58859ba11a28abd2498b5b6d/html5/thumbnails/34.jpg)
Yesterday
© Copyright 2016 HomeAway, Inc.
Systems of Record
![Page 35: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra](https://reader034.vdocuments.us/reader034/viewer/2022042907/58859ba11a28abd2498b5b6d/html5/thumbnails/35.jpg)
Today
© Copyright 2016 HomeAway, Inc.
Systems of Engagement
![Page 36: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra](https://reader034.vdocuments.us/reader034/viewer/2022042907/58859ba11a28abd2498b5b6d/html5/thumbnails/36.jpg)
Tomorrow
© Copyright 2016 HomeAway, Inc.
Systems of Intelligence
![Page 37: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra](https://reader034.vdocuments.us/reader034/viewer/2022042907/58859ba11a28abd2498b5b6d/html5/thumbnails/37.jpg)
Don’t be a dinosaur…
© Copyright 2016 HomeAway, Inc.
ATDT
![Page 38: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra](https://reader034.vdocuments.us/reader034/viewer/2022042907/58859ba11a28abd2498b5b6d/html5/thumbnails/38.jpg)
Thank you
© Copyright 2016 HomeAway, Inc.
![Page 39: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra](https://reader034.vdocuments.us/reader034/viewer/2022042907/58859ba11a28abd2498b5b6d/html5/thumbnails/39.jpg)
End of Presentation
© Copyright 2016 HomeAway, Inc.