fluentd - set up once, collect more
DESCRIPTION
Fluentd meetup @ Rackspace San Francisco 2014-02-19TRANSCRIPT
Sadayuki Furuhashi
Founder & Software Architect
Set Up Once, Collect More.
Treasure Data, inc.
Self-introduction
> Sadayuki Furuhashigithub/twitter: @frsyuki
> Treasure Data, Inc.Founder & Software Architect
> Open source projectsMessagePack - e!cient object serializer
Fluentd - data collection toolServerEngine - Ruby framework to build multiprocess servers
LS4 - distributed object storage system (suspended)kumofs - distributed key-value data store (suspended)
What’s Fluentd?
An extensible & reliable data collection tool
What’s Fluentd?
An extensible & reliable data collection tool
simple core + plugins
bu!ering, HA (failover),load balance, etc.
like syslogd
Blue"ood
MongoDB
Hadoop
Metrics
Amazon S3
Analysis
Archiving
MySQL
Apache
Frontend
Access logs
syslogd
App logs
System logs
Backend
Your systemfilter / buffer / routing
Blue"ood
MongoDB
Hadoop
Metrics
Amazon S3
Analysis
Archiving
MySQL
Apache
Frontend
Access logs
syslogd
App logs
System logs
Backend
Your systemfilter / buffer / routing
Blue"ood
MongoDB
Hadoop
Metrics
Amazon S3
Analysis
Archiving
MySQL
Apache
Frontend
Access logs
syslogd
App logs
System logs
Backend
Your systemfilter / buffer / routing
Blue"ood
MongoDB
Hadoop
Metrics
Amazon S3
Analysis
Archiving
MySQL
Apache
Frontend
Access logs
syslogd
App logs
System logs
Backend
Your systemfilter / buffer / routing
Input Plugins Output Plugins
Buffer Plugins(Filter Plugins)
# logs from a file<source> type tail path /var/log/httpd.log format apache2 tag web.access</source>
# logs from client libraries<source> type forward port 24224</source>
# store logs to MongoDB and S3<match **> type copy
<match> type mongo host mongo.example.com capped capped_size 200m </match>
<match> type s3 path archive/ </match></match>
Fluentd
API servers
FluentdRails app
Fluentd
Queue
PerfectQueue
Ruby appFluentd
FluentdRails app
worker servers
Ruby appFluentd
"uent-logger-ruby+ in_forward
watch server
scriptout_forwardin_exec
Fluentd in Treasure Data
watch server
Librato Metricsfor realtime analysis
Treasure Datafor historical analysis
out_tdlog out_metricsense
✓ streaming aggregation
Fluentd in Treasure Data
Fluentd
Internal Architecture
Input Bu!er Output
Plugin Plugin Plugin
2012-02-04 01:33:51myapp.buylog { “user”: ”me”, “path”: “/buyItem”, “price”: 150, “referer”: “/landing”}
timetag
record
Architecture :: Input plugins
Input
HTTP+JSON (in_http)File tail (in_tail)Syslog (in_syslog)...
Plugin
✓ Receive logs
✓ Or pull logs from data sources
✓ in non-blocking manner
Architecture :: Output plugins
Plugin
✓ Write or send event logs
Output
File (out_"le)Amazon S3 (out_s3)MongoDB (out_mongo)...
Architecture :: Bu!er plugins
Plugin
✓ Improve performance
✓ Provide reliability
✓ Provide thread-safety
Bu!er
Memory (buf_memory)File (buf_"le)
Architecture :: Bu!er plugins
Plugin
✓ Improve performance
✓ Provide reliability
✓ Provide thread-safety
chunk
chunk
chunk output
Input
in_tail
Apache
buf_#lein_tailFluentd
/var/log/access.log
/var/log/"uentd/bufer
in_tail
Apache
buf_#lein_tailFluentd
/var/log/access.log
/var/log/"uentd/bufer
✓ retrying automatically,✓ with exponential wait,✓ and persistence on a disk.
in_tail
Apache
buf_#lein_tailFluentd
/var/log/access.log
/var/log/"uentd/bufer
✓ bu!ering for any outputs,✓ with exponential wait,✓ and persistence on a disk.Amazon S3 Hadoop
Fluentd
Fluentd Fluentd
!uentd
applications, log #les, HTTP, etc.
FluentdFluentd Fluentd Fluentd
Heartbeat
Fluentd
Fluentd Fluentd
!uentd
applications, log #les, HTTP, etc.
FluentdFluentd Fluentd Fluentd
Heartbeat
✓ load balancing or active-backup
class CassandraOutput < BufferedOutput Fluent::Plugin.register_output('cassandra', self)
require 'cassandra'
config_param :keyspace, :string config_param :columnfamily, :string config_param :host, :string, :default => 'localhost' config_param :port, :int, :default => 9160
def start super @connection = Cassandra.new(@keyspace, “#{@host}:#{@port}”) end
def format(tag, time, record) record['tag'] = tag record['time'] = time record.to_msgpack end
def write(chunk) chunk.msgpack_each do |record| @connection.insert(@columnfamily, "#{record["tag"]}_#{record["time"]}", record) end endend
out_cassandra
Use cases
http://www.slideshare.net/tagomoris/rubykaigi-2013-111130
“Complex Event Processing on Ruby, Fluentd and Norikra” TAGOMORI Satoshi, RubyKaigi 2013
http://www.slideshare.net/tagomoris/log-analysis-with-hadoop-in-livedoor-2013
“Log analysis system with Hadoop” NHN Japan Corp., Hadoop Conference Japan 2013
http://www.slideshare.net/sylvainkalache/"uentd-at-slideshare
“"uentd at slideshare” @SylvainKalache, Fluentd meetup
Use cases
http://www.slideshare.net/frsyuki/how-24042353
“How we use Fluentd in Treasure Data” Sadayuki Furuhashi, Fluentd meetup at slideshare
http://www.slideshare.net/sematext/solr-for-indexing-and-searching-logs
“Using Solr to Search and Analyze Logs” Radu Gheorghe
http://docs."uentd.org/articles/free-alternative-to-splunk-by-"uentd
“Free Alternative to Splunk Using Fluentd”
Expected discussions...
> Who are using Fluentd?
> What’s the di!erences compared to XYZ?
> Is there a plugin to send/recv data to/from XYZ?
> How can my system XYZ send data to Fluentd?
> Does Fluentd really work in case of XYZ?
class SomeInput < Fluent::Input Fluent::Plugin.register_input('myin', self)
config_param :tag, :string
def start Thread.new { while true time = Engine.new record = {“user”=>1, “size”=>1} Engine.emit(@tag, time, record) end } end
def shutdown ... endend
<source> type myin tag myapp.api.heartbeat</source>
class SomeOutput < Fluent::BufferedOutput Fluent::Plugin.register_output('myout', self)
config_param :myparam, :string
def format(tag, time, record) [tag, time, record].to_json + "\n" end
def write(chunk) puts chunk.read endend
<match **> type myout myparam foobar</match>
class MyTailInput < Fluent::TailInput Fluent::Plugin.register_input('mytail', self)
def configure_parser(conf) ... end
def parse_line(line) array = line.split(“\t”) record = {“user”=>array[0], “item”=>array[1]}
time = Engine.now return time, record endend
<source> type mytail</source>