fluentd unified logging layer at fossasia
TRANSCRIPT
Masahiro NakagawaMar 14, 2015
Fossasia 2015
FluentdUnified logging layer
Who am I
> Masahiro Nakagawa > github: @repeatedly
> Treasure Data, Inc. > Senior Software Engineer > Fluentd / td-agent developer
> Living at OSS :) > D language - Phobos, a.k.a standard library, committer > Fluentd - Main maintainer > MessagePack / RPC - D and Python (only RPC) > The organizer of several meetups (Presto, DTM, etc…) > etc…
Structured logging !
Reliable forwarding !
Pluggable architecture
http://fluentd.org/
github:fluent/fluentd
What’s Fluentd?
> Data collector for unified logging layer > Streaming data transfer based on JSON > Simple core + plugins written in Ruby
> Gem based various plugins > http://www.fluentd.org/plugins
> List of users > http://www.fluentd.org/testimonials
Before
✓ duplicated code for error handling... ✓ messy code for retrying mechanism...
So painful!
After
Concept / Design
Core Plugins
> Divide & Conquer
> Buffering & Retrying
> Error handling
> Message routing
> Parallelism
> Read / receive data > Parse data > Filter data > Buffer data > Format data > Write / send data
Core Plugins
> Divide & Conquer
> Buffering & Retrying
> Error handling
> Message routing
> Parallelism
> Read / receive data > Parse data > Filter data > Buffer data > Format data > Write / send data
Common Concerns
Use Case Specific
> default second unit
> from data source
Event structure(log message)
✓ Time
> for message routing
> where is from?
✓ Tag
> JSON format
> MessagePackinternally
> schema-free
✓ Record
Reliable streaming data transfer
error retry
error retry retry
retryBatch
Stream
Other stream
(micro batch)
Nagios
PostgreSQL
Hadoop
Alerting
Amazon S3
Analysis
Archiving
Elasticsearch
Apache
Frontend
Access logs
syslogd
App logs
System logs
Backend
Databasesbuffering / retrying / routing
M x N → M + N
plugins
Use case
Simple forwarding
# logs from a file<source> type tail path /var/log/httpd.log pos_file /tmp/pos_file format apache2 tag backend.apache</source>!# logs from client libraries<source> type forward port 24224</source>!
# store logs to MongoDB<match backend.*> type mongo database fluent collection test</match>
Less Simple Forwarding
- At-most-once / At-least-once - HA (failover) - Load-balancing
All data
Near realtime and batch combo!
Hot data
# logs from a file<source> type tail path /var/log/httpd.log pos_file /tmp/pos_file format apache2 tag web.access</source>!# logs from client libraries<source> type forward port 24224</source>!
# store logs to ES and HDFS<match web.*> type copy <store> type elasticsearch logstash_format true </store> <store> type webhdfs host namenode port 50070 path /path/on/hdfs/ </store></match>
CEP for Stream Processing
Norikra is a SQL based CEP engine: http://norikra.github.io/
Container Logging
> Kubernetes
!
!
!
!
!
> Google Compute Engine > https://cloud.google.com/logging/docs/install/compute_install
Fluentd on Kubernetes / GCE
Slideshare
http://engineering.slideshare.net/2014/04/skynet-project-monitor-scale-and-auto-heal-a-system-in-the-cloud/
Log Analysis System And its designs in LINE Corp. 2014 early
Architecture
Internal Architecture
Input Parser Buffer Output FormatterFilter OutputFormatter
Internal Architecture
Input Parser Buffer Output FormatterFilter
“input-ish” “output-ish”
Input plugins
File tail (in_tail) Syslog (in_syslog) HTTP (in_http) HTTP/2 (in_http2 WIP) ...
✓ Receive logs
✓ Or pull logs from data sources
✓ non-blocking
InpuInput
Parser plugins
JSON Regexp Apache/Nginx/Syslog CSV/TSVetc.
✓ Parse into JSON
✓ Common formats out of the box
✓ Some inputs plugin depends on
Parser plugin
✓ v0.10.46 and above
ParseParser
Filter plugins
grep record_transformer suppress …
✓ Filter / Mutate record
✓ Record level and Stream level
✓ v0.12 and above
ParseParserFilter
Buffer plugins
✓ Improve performance
✓ Provide reliability
✓ Provide thread-safetyMemory (buf_memory) File (buf_file)
BuffeBuffer
Buffer internal
✓ Chunk = adjustable unit of data
✓ Buffer = Queue of chunks
chunk
chunk
chunk output
Input
Formatter plugins
✓ Format output
✓ Some plugins depends on
Formatter plugins
✓ v0.10.46 and aboveJSON CSV/TSV “single value” msgpack
FormattFormatter
Output plugins
✓ Write to external systems
✓ Buffered & Non-buffered
✓ 200+ plugins
Outpu
File (out_file) Amazon S3 (out_s3) MongoDB (out_mongo) ...
Output
Roadmap> v0.10 (old stable) > v0.12 (current stable)
> Filter / Label / At-least-once > v0.14 (spring, 2015)
> New plugin APIs, ServerEngine, Time… > v1 (summer, 2015)
> Fix new features / APIs
https://github.com/fluent/fluentd/wiki/V1-Roadmap
Goodies
fluent-bit> Made for Embedded Linux
> OpenEmbedded & Yocto Project > Intel Edison, RasPi & Beagle Black boards > https://github.com/fluent/fluent-bit
> Standalone application or Library mode > Built-in plugins
> input: cpu, kmsg, output: fluentd > First release at the end of Mar 2015
fluentd-ui
> Manage Fluentd instance via Web UI > https://github.com/fluent/fluentd-ui
Treasure Agent (td-agent)
> Treasure Data distribution of Fluentd > including Ruby and QA’ed plugins
> Treasure Agent 2 is current stable > We recommend to use v2, not v1 > including fluentd-ui
> Next release, 2.2.0, uses fluentd v0.12
Embulk
> Bulk Loader version of Fluentd > Pluggable architecture
> JRuby, JVM languages > High performance parallel processing
> Share your script as a plugin > https://github.com/embulk
http://www.slideshare.net/frsyuki/embuk-making-data-integration-works-relaxed
HDFS
MySQL
Amazon S3
Embulk
CSV Files
SequenceFile
Salesforce.com
Elasticsearch
Cassandra
Hive
Redis
✓ Parallel execution ✓ Data validation ✓ Error recovery ✓ Deterministic behaviour ✓ Idempotent retrying
Plugins Plugins
bulk load
Check: treasuredata.comCloud service for the entire data pipeline