fluentd and kafka
TRANSCRIPT
Fluentd and KafkaHadoop / Spark Conference Japan 2016 Feb 8, 2016
Who are you?
• Masahiro Nakagawa • github: @repeatedly
• Treasure Data Inc. • Fluentd / td-agent developer • Fluentd Enterprise support
• I love OSS :) • D Language, MessagePack, The organizer of several meetups, etc…
Fluentd• Pluggable streaming event collector
• Lightweight, robust and flexible • Lots of plugins on rubygems • Used by AWS, GCP, MS and more companies
• Resources • http://www.fluentd.org/ • Webinar: https://www.youtube.com/watch?v=6uPB_M7cbYk
Popular case
App
Push
Push
Forwarder Aggregator Destination
• Distributed messaging system • Producer - Broker - Consumer pattern • Pull model, replication, etc
Apache Kafka
App
PushPull
Producer Broker DestinationConsumer
Push vs Pull
• Push: • Easy to transfer data to multiple destinations • Hard to control stream ratio in multiple streams
• Pull: • Easy to control stream flow / ratio • Should manage consumers correctly
There are 2 ways
• fluent-plugin-kafka
• kafka-fluentd-consumer
fluent-plugin-kafka• Input / Output plugin for kafka
• https://github.com/htgc/fluent-plugin-kafka • in_kafka, in_kafka_group, out_kafka, out_kafka_buffered
• Pros • Easy to use and output support
• Cons • Performance is not primary
Configuration example
<source> @type kafka topics web,system format json add_prefix kafka. # more options </source>
<match kafka.**> @type kafka_buffered output_data_type msgpack default_topic metrics compression_codec gzip required_acks 1 </match>
https://github.com/htgc/fluent-plugin-kafka#usage
kafka fluentd consumer• Stand-alone kafka consumer for fluentd
• https://github.com/treasure-data/kafka-fluentd-consumer
• Send cosumed events to fluentd’s in_forward
• Pros • High performance and Java API features
• Cons • Need Java runtime
Run consumer
• Edit log4j and fluentd-consumer properties
• Run following command: $ java \ -Dlog4j.configuration=file:///path/to/log4j.properties \ -jar path/to/kafka-fluentd-consumer-0.2.1-all.jar \ path/to/fluentd-consumer.properties
Properties examplefluentd.tag.prefix=kafka.event.
fluentd.record.format=regexp # default is json
fluentd.record.pattern=(?<text>.*) # for regexp format
fluentd.consumer.topics=app.* # can use Java Rege
fluentd.consumer.topics.pattern=blacklist # default is whitelist
fluentd.consumer.threads=5
https://github.com/treasure-data/kafka-fluentd-consumer/blob/master/config/fluentd-consumer.properties
With Fluentd example
<source> @type forward </source>
<source> @type exec command java -Dlog4j.configuration=file:///path/to/log4j.properties -jar /path/to/kafka-fluentd-consumer-0.2.1-all.jar /path/to/config/fluentd-consumer.properties tag dummy format json </source>
https://github.com/treasure-data/kafka-fluentd-consumer#run-kafka-consumer-for-fluentd-via-in_exec
Conclusion
• Kafka is now becomes important component on data platform
• Fluentd can communicate with Kafka • Fluentd plugin and kafka consumer
• Building reliable and flexible data pipeline with Fluentd and Kafka