rubykaigi 2014: serverengine
DESCRIPTION
http://rubykaigi.org/2014/presentation/S-MasahiroNakagawa Movie in Japanese: http://www.ustream.tv/recorded/52839701TRANSCRIPT
ServerEngineServer programming framework for Ruby
Masahiro NakagawaSep 19, 2014
RubyKaigi 2014
Who are you?
> Masahiro Nakagawa > github/twitter: @repeatedly
> Treasure Data, Inc. > Senior Software Engineer > Fluentd / td-agent developer
> I love OSS :) > D language - Phobos committer > Fluentd - Main maintainer > MessagePack / RPC- D and Python (only RPC) > The organizer of Presto Source Code Reading > etc…
0. Background + Intro
Ruby is not only for web apps!
> System programs • Chef - server configuration management tool • Serverspec - spec framework for servers • Apache Deltacloud - IaaS API abstraction library
> Network servers • Starling - distributed message queue server • Unicorn - multiprocess HTTP server
> Log servers • Fluentd - extensible data collection tool
Problem: server programming is hard
Server programs should support: > multi-process or multi-thread > robust error handling > log rotation > signal handling > dynamic reconfiguration > metrics collection > etc...
Solution: Use a framework
ServerEngineA framework for server programming in Ruby
github.com/fluent/serverengine
What’s ServerEngine?
With ServerEngine, we can write multi-process server programs, like Unicorn, easily.
What we need to write is a 2 modules: Worker module and Server module.
Everything else, including daemonize, logging, dynamic reconfiguration, multi-processingis done by ServerEngine.
require 'serverengine' !module MyWorker def run until @stop logger.info "Hello world!" sleep 1 end end ! def stop @stop = true end end !se = ServerEngine.create(nil, MyWorker, { log: 'myserver.log', pid_path: 'myserver.pid', }) se.run
Hello world in ServerEngine
Worker
Server
Config
How ServerEngine works?
1. Robust process management (supervisor)
2. Multi-process and multi-threading
3. Dynamic configuration reloading
4. Log rotation
5. Signal handling
6. Live restart
7. “sigdump”
1. Robust process management
Supervisor Server
Heartbeat via pipe & auto-restart
Dynamic reconfiguration & live restart support
Multi-process or Multi-threadWorker
Worker
Worker
Server WorkerSupervisor
• Manage Server • heartbeat • attach / detach • restart !
• Disable by default !
• No extension point
• Manage Worker • monitor • restart !
• Some execution types • Embedded • Thread • Process !
• Extension point • before_run • after_run • after_restart
Each role overview
• Execution unit • implement run
method !
• Extension point • stop • reload • before_fork • after_start
2. Multi-process & multi-threading
require 'serverengine' !module MyWorker def run until @stop logger.info "Awesome work!" sleep 1 end end ! def stop @stop = true end end !se = ServerEngine.create(nil, MyWorker, { daemonize: true, log: 'myserver.log', pid_path: 'myserver.pid', worker_type: 'process', workers: 4, }) se.run
se = ServerEngine.create(nil, MyWorker,{ daemonize: true, log: 'myserver.log', pid_path: 'myserver.pid', worker_type: 'thread', workers: 8, }) se.run
> 3 server types > embedded > process > thread
- thread example
Worker
2. Multi-process & multi-threadingprocess threadembedded
Server
Worker
• default mode • use main thread
• use fork for parallel execution
• not work on Windows
• use thread for parallel execution
• for JRuby and Rubinius
Server
WorkerWorker
fork
Worker
Server
WorkerWorker
Thread.new
3. Dynamic reconfiguration
module MyWorker def initialize reload end ! def run # … end ! def reload @message = config["message"] || "default" @sleep = config["sleep"] || 1 end end !se = ServerEngine.create(nil, MyWorker) do YAML.load_file("config.yml").merge({ :daemonize => true, :worker_type => 'process', }) end se.run
> Overwrite method > reload
in worker > reload_config
in server > Send USR2 signal
4. Log rotation
se = ServerEngine.create(MyServer, MyWorker, { log: 'myserver.log', log_level: 'debug', log_rotate_age: 5, log_rotate_size: 1 * 1024 * 1024, }) se.run
> Support useful features > multi-process aware log rotation > support “trace” level
> Port to Ruby core > https://github.com/ruby/ruby/pull/428
5. Signal handling> Queue based signal handling
> serialize signal processing > signal handling is separated from
signal handler to avoid lock issues
SignalThread.new do |st| st.trap(:TERM) { server.stop(true) } st.trap(:QUIT) { server.stop(false) } st.trap(:USR1) { server.restart(true) } st.trap(:HUP) { server.restart(false) } st.trap(:USR2) { server.reload } # ... end
INT { process_int }
5. Signal handling - register 1
SignalThreadRegister Signal
INT
USR1 { process_usr1 }
INT { process_int }
5. Signal handling - register 2
SignalThreadRegister Signal
USR1
5. Signal handling - register 3
SignalThread
QUIT { process_quit }
TERM { process_term }
USR1 { process_usr1 }
INT { process_int }
Register Signal
XXX
5. Signal handling - process 1
SignalThread
QUIT { process_quit }
TERM { process_term }
USR1 { process_usr1 }
{ process_int }
USR1
Send Signal
USR1
Monitor queue
INT
5. Signal handling - process 2
SignalThread
QUIT { process_quit }
TERM { process_term }
USR1 { process_usr1 }
INT { process_int }
USR1
QUIT
Send Signal
QUIT
5. Signal handling - process 3
SignalThread
QUIT { process_quit }
TERM { process_term }
USR1 { process_usr1 }
INT { process_int }
QUIT
Send Signal
XXX
6. Live Restart> Minimize server restart downtime
> via INT signal > enable_detach and
supervisor parameters must be true > Network server can’t use live restart
> “Address already in use” occurred > use “process” worker and USR1 instead
• restart workers, not server
6. Live Restart - flow 1
Supervisor Server WorkerWorkerWorker
1. start a server
Supervisor Server WorkerWorkerWorker
6. Live Restart - flow 2
2. receive SIGINT and wait for shutdown of the server
Supervisor Server
6. Live Restart - flow 3
3. start new server if the server doesn’t exit in server_detach_wait
Server WorkerWorkerWorker
Worker
7. “sigdump”
> SIGQUIT of JavaVM for Ruby > https://github.com/frsyuki/sigdump > dump backtrace of running threads
and allocated object list > for debugging, slow code, dead-lock, …
> ServerEngine traps SIGCONT for sigdump > Trapping signal is configurable using
“SIGDUMP_SIGNAL” environment variable
7. sigdump example% kill -CONT pid% cat /tmp/sigdump-66276.log
Sigdump at 2014-09-18 18:44:43 +0900 process 66276 (se.rb) Thread #<Thread:0x007fdc130cb7e0> status=sleep priority=0 se.rb:7:in `sleep' se.rb:7:in `run' /Users/repeatedly/.rbenv/versions/2.1.2/lib/ruby/gems/2.1.0/gems/serverengine-1.5.9/lib/serverengine/worker.rb:67:in `main' /Users/repeatedly/.rbenv/versions/2.1.2/lib/ruby/gems/2.1.0/gems/serverengine-1.5.9/lib/serverengine/embedded_server.rb:24:in `run' /Users/repeatedly/.rbenv/versions/2.1.2/lib/ruby/gems/2.1.0/gems/serverengine-1.5.9/lib/serverengine/server.rb:85:in `main' /Users/repeatedly/.rbenv/versions/2.1.2/lib/ruby/gems/2.1.0/gems/serverengine-1.5.9/lib/serverengine/daemon.rb:101:in `main' … Thread #<ServerEngine::SignalThread:0x007fdc13314038> status=run priority=0 … Built-in objects: 33,017: TOTAL 16,845: T_STRING … All objects: 8,939: String 1,459: Array … String 210,569 bytes Array 4 elements Hash 14 pairs
> A fast background processing framework for Ruby > use ServerEngine and RabbitMQ > jondot.github.io/sneakers/
Use-case1: Sneakers
Server WorkerWorkerWorker
Sneakers RabbitMQ
Task
Use-case2: Fluentd v1> Data collector for unified logging layer
> http://www.fluentd.org/ > Improve core features
> Logging > Signal handling
> New features based on ServerEngine > Multi-process support > Zero downtime restart > etc…
Fluentd v1 - Multi-process
Worker
Supervisor
Worker Worker
Separate stream pipelines in one instance!
<Worker> input tail output forward </worker>
<Worker> input forward output webhdfs </worker>
<Worker> input foo output bar </worker>
> SocketManager shares the resource
32
SupervisorTCP
1. Listen TCP socket
Fluentd v1 - Zero downtime restart
> SocketManager shares the resource
33
Worker
Supervisor
heartbeat
TCP
TCP
1. Listen TCP socket
2. Pass its socket to worker
Fluentd v1 - Zero downtime restart
> SocketManager shares the resource
34
Worker
Supervisor
Worker
TCP
TCP
1. Listen TCP socket
2. Pass its socket to worker
3. Do same actionat worker restartingwith keeping TCP socket
heartbeat
Fluentd v1 - Zero downtime restart
Demo (if I have a time…)
Check: www.treasuredata.com
Cloud service for the entire data pipeline