distributed ruby and rails

125
Distributed Ruby and Rails @ihower http://ihower.tw 2010/1

Post on 12-Sep-2014

24.934 views

Category:

Technology


5 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Distributed Ruby and Rails

DistributedRuby and Rails

@ihowerhttp://ihower.tw

2010/1

Page 2: Distributed Ruby and Rails

About Me• 張文鈿 a.k.a. ihower

• http://ihower.tw

• http://twitter.com/ihower

• http://github.com/ihower

• Ruby on Rails Developer since 2006

• Ruby Taiwan Community

• http://ruby.tw

Page 3: Distributed Ruby and Rails

Agenda• Distributed Ruby

• Distributed Message Queues

• Background-processing in Rails

• Message Queues for Rails

• SOA for Rails

• Distributed Filesystem

• Distributed database

Page 4: Distributed Ruby and Rails

1.Distributed Ruby

• DRb

• Rinda

• Starfish

• MapReduce

• MagLev VM

Page 5: Distributed Ruby and Rails

DRb

• Ruby's RMI (remote method invocation) system

• an object in one Ruby process can invoke methods on an object in another Ruby process on the same or a different machine

Page 6: Distributed Ruby and Rails

DRb (cont.)

• no defined interface, faster development time

• tightly couple applications, because no defined API, but rather method on objects

• unreliable under large-scale, heavy loads production environments

Page 7: Distributed Ruby and Rails

server example 1require 'drb'

class HelloWorldServer

def say_hello 'Hello, world!' end

end

DRb.start_service("druby://127.0.0.1:61676", HelloWorldServer.new)DRb.thread.join

Page 8: Distributed Ruby and Rails

client example 1

require 'drb'

server = DRbObject.new_with_uri("druby://127.0.0.1:61676")

puts server.say_helloputs server.inspect

# Hello, world!# <DRb::DRbObject:0x1003c04c8 @ref=nil, @uri="druby://127.0.0.1:61676">

Page 9: Distributed Ruby and Rails

example 2

# user.rbclass User attr_accessor :username end

Page 10: Distributed Ruby and Rails

server example 2require 'drb'require 'user'

class UserServer attr_accessor :users def find(id) self.users[id-1] end end

user_server = UserServer.newuser_server.users = []5.times do |i| user = User.new user.username = i + 1 user_server.users << userend

DRb.start_service("druby://127.0.0.1:61676", user_server)DRb.thread.join

Page 11: Distributed Ruby and Rails

client example 2

require 'drb'

user_server = DRbObject.new_with_uri("druby://127.0.0.1:61676")

user = user_server.find(2)

puts user.inspectputs "Username: #{user.username}"user.name = "ihower"puts "Username: #{user.username}"

Page 12: Distributed Ruby and Rails

Err...

# <DRb::DRbUnknown:0x1003b8318 @name="User", @buf="\004\bo:\tUser\006:\016@usernamei\a"># client2.rb:8: undefined method `username' for #<DRb::DRbUnknown:0x1003b8318> (NoMethodError)

Page 13: Distributed Ruby and Rails

Why? DRbUndumped

• Default DRb operation

• Pass by value

• Must share code

• With DRbUndumped

• Pass by reference

• No need to share code

Page 14: Distributed Ruby and Rails

Example 2 Fixed# user.rbclass User include DRbUndumped attr_accessor :username end

# <DRb::DRbObject:0x1003b84f8 @ref=2149433940, @uri="druby://127.0.0.1:61676"># Username: 2# Username: ihower

Page 15: Distributed Ruby and Rails

Why use DRbUndumped?

• Big objects

• Singleton objects

• Lightweight clients

• Rapidly changing software

Page 16: Distributed Ruby and Rails

ID conversion

• Converts reference into DRb object on server

• DRbIdConv (Default)

• TimerIdConv

• NamedIdConv

• GWIdConv

Page 17: Distributed Ruby and Rails

Beware of garbage collection

• referenced objects may be collected on server (usually doesn't matter)

• Building Your own ID Converter if you want to control persistent state.

Page 18: Distributed Ruby and Rails

DRb security

require 'drb'

ro = DRbObject.new_with_uri("druby://127.0.0.1:61676")class << ro undef :instance_evalend

# !!!!!!!! WARNING !!!!!!!!! DO NOT RUNro.instance_eval("`rm -rf *`")

Page 19: Distributed Ruby and Rails

$SAFE=1

instance_eval': Insecure operation - instance_eval (SecurityError)

Page 20: Distributed Ruby and Rails

DRb security (cont.)

• Access Control Lists (ACLs)

• via IP address array

• still can run denial-of-service attack

• DRb over SSL

Page 21: Distributed Ruby and Rails

Rinda

• Rinda is a Ruby port of Linda distributed computing paradigm.

• Linda is a model of coordination and communication among several parallel processes operating upon objects stored in and retrieved from shared, virtual, associative memory. This model is implemented as a "coordination language" in which several primitives operating on ordered sequence of typed data objects, "tuples," are added to a sequential language, such as C, and a logically global associative memory, called a tuplespace, in which processes store and retrieve tuples. (WikiPedia)

Page 22: Distributed Ruby and Rails

Rinda (cont.)

• Rinda consists of:

• a TupleSpace implementation

• a RingServer that allows DRb services to automatically discover each other.

Page 23: Distributed Ruby and Rails

RingServer

• We hardcoded IP addresses in DRb program, it’s tight coupling of applications and make fault tolerance difficult.

• RingServer can detect and interact with other services on the network without knowing IP addresses.

Page 24: Distributed Ruby and Rails

[email protected]

RingServervia broadcast UDP

address

Service X@ 192.168.1.12

1. Where Service X?

2. Service X: 192.168.1.12

3. Hi, Service X @ 192.168.1.12

4. Hi There 192.168.1.100

Page 25: Distributed Ruby and Rails

ring server example

require 'rinda/ring'require 'rinda/tuplespace'

DRb.start_serviceRinda::RingServer.new(Rinda::TupleSpace.new)DRb.thread.join

Page 26: Distributed Ruby and Rails

service examplerequire 'rinda/ring'

class HelloWorldServer include DRbUndumped # Need for RingServer def say_hello 'Hello, world!' end

end

DRb.start_servicering_server = Rinda::RingFinger.primaryring_server.write([:hello_world_service, :HelloWorldServer, HelloWorldServer.new, 'I like to say hi!'], Rinda::SimpleRenewer.new)

DRb.thread.join

Page 27: Distributed Ruby and Rails

client examplerequire 'rinda/ring'

DRb.start_servicering_server = Rinda::RingFinger.primary

service = ring_server.read([:hello_world_service, nil,nil,nil])server = service[2]

puts server.say_helloputs service.inspect

# Hello, world!# [:hello_world_service, :HelloWorldServer, #<DRb::DRbObject:0x10039b650 @uri="druby://fe80::21b:63ff:fec9:335f%en1:57416", @ref=2149388540>, "I like to say hi!"]

Page 28: Distributed Ruby and Rails

TupleSpaces

• Shared object space

• Atomic access

• Just like bulletin board

• Tuple template is [:name, :Class, object, ‘description’ ]

Page 29: Distributed Ruby and Rails

5 Basic Operations

• write

• read

• take (Atomic Read+Delete)

• read_all

• notify (Callback for write/take/delete)

Page 30: Distributed Ruby and Rails

Starfish

• Starfish is a utility to make distributed programming ridiculously easy

• It runs both the server and the client in infinite loops

• MapReduce with ActiveRecode or Files

Page 31: Distributed Ruby and Rails

starfish foo.rb# foo.rb

class Foo attr_reader :i

def initialize @i = 0 end

def inc logger.info "YAY it incremented by 1 up to #{@i}" @i += 1 endend

server :log => "foo.log" do |object| object = Foo.newend

client do |object| object.incend

Page 32: Distributed Ruby and Rails

starfish server exampleARGV.unshift('server.rb')

require 'rubygems'require 'starfish'

class HelloWorld def say_hi 'Hi There' endend

Starfish.server = lambda do |object| object = HelloWorld.newend

Starfish.new('hello_world').server

Page 33: Distributed Ruby and Rails

starfish client exampleARGV.unshift('client.rb')

require 'rubygems'require 'starfish'

Starfish.client = lambda do |object| puts object.say_hi exit(0) # exit program immediatelyend

Starfish.new('hello_world').client

Page 34: Distributed Ruby and Rails

starfish client example (another way)

ARGV.unshift('server.rb')

require 'rubygems'require 'starfish'

catch(:halt) do Starfish.client = lambda do |object| puts object.say_hi throw :halt end Starfish.new('hello_world').client end

puts "bye bye"

Page 35: Distributed Ruby and Rails

MapReduce

• introduced by Google to support distributed computing on large data sets on clusters of computers.

• inspired by map and reduce functions commonly used in functional programming.

Page 36: Distributed Ruby and Rails

starfish server exampleARGV.unshift('server.rb')

require 'rubygems'require 'starfish'

Starfish.server = lambda{ |map_reduce| map_reduce.type = File map_reduce.input = "/var/log/apache2/access.log" map_reduce.queue_size = 10 map_reduce.lines_per_client = 5 map_reduce.rescan_when_complete = false}

Starfish.new('log_server').server

Page 37: Distributed Ruby and Rails

starfish client exampleARGV.unshift('client.rb')

require 'rubygems'require 'starfish'

Starfish.client = lambda { |logs| logs.each do |log| puts "Processing #{log}" sleep(1) end}

Starfish.new("log_server").client

Page 38: Distributed Ruby and Rails

Other implementations

• Skynet

• Use TupleSpace or MySQL as message queue

• Include an extension for ActiveRecord

• http://skynet.rubyforge.org/

• MRToolkit based on Hadoop

• http://code.google.com/p/mrtoolkit/

Page 39: Distributed Ruby and Rails

MagLev VM

• a fast, stable, Ruby implementation with integrated object persistence and distributed shared cache.

• http://maglev.gemstone.com/

• public Alpha currently

Page 40: Distributed Ruby and Rails

2.Distributed Message Queues

• Starling

• AMQP/RabbitMQ

• Stomp/ActiveMQ

• beanstalkd

Page 41: Distributed Ruby and Rails

what’s message queue?

Client Queue

Processor

Message X

Check and processing

Page 42: Distributed Ruby and Rails

Why not DRb?

• DRb has security risk and poorly designed APIs

• distributed message queue is a great way to do distributed programming: reliable and scalable.

Page 43: Distributed Ruby and Rails

Starling

• a light-weight persistent queue server that speaks the Memcache protocol (mimics its API)

• Fast, effective, quick setup and ease of use

• Powered by EventMachinehttp://eventmachine.rubyforge.org/EventMachine.html

• Twitter’s open source project, they use it before 2009. (now switch to Kestrel, a port of Starling from Ruby to Scala)

Page 44: Distributed Ruby and Rails

Starling command

• sudo gem install starling-starling

• http://github.com/starling/starling

• sudo starling -h 192.168.1.100

• sudo starling_top -h 192.168.1.100

Page 45: Distributed Ruby and Rails

Starling set example

require 'rubygems'require 'starling'

starling = Starling.new('192.168.1.4:22122')

100.times do |i| starling.set('my_queue', i)end

append to the queue, not overwrite in Memcached

Page 46: Distributed Ruby and Rails

Starling get example

require 'rubygems'require 'starling'

starling = Starling.new('192.168.2.4:22122')

loop do puts starling.get("my_queue")end

Page 47: Distributed Ruby and Rails

get method

• FIFO

• After get, the object is no longer in the queue. You will lost message if processing error happened.

• The get method blocks until something is returned. It’s infinite loop.

Page 48: Distributed Ruby and Rails

Handle processing error exception

require 'rubygems'require 'starling'

starling = Starling.new('192.168.2.4:22122')results = starling.get("my_queue")

begin puts results.flattenrescue NoMethodError => e puts e.message Starling.set("my_queue", [results])rescue Exception => e Starling.set("my_queue", results) raise eend

Page 49: Distributed Ruby and Rails

Starling cons

• Poll queue constantly

• RabbitMQ can subscribe to a queue that notify you when a message is available for processing.

Page 50: Distributed Ruby and Rails

AMQP/RabbitMQ

• a complete and highly reliable enterprise messaging system based on the emerging AMQP standard.

• Erlang

• http://github.com/tmm1/amqp

• Powered by EventMachine

Page 51: Distributed Ruby and Rails

Stomp/ActiveMQ

• Apache ActiveMQ is the most popular and powerful open source messaging and Integration Patterns provider.

• sudo gem install stomp

• ActiveMessaging plugin for Rails

Page 52: Distributed Ruby and Rails

beanstalkd• Beanstalk is a simple, fast workqueue

service. Its interface is generic, but was originally designed for reducing the latency of page views in high-volume web applications by running time-consuming tasks asynchronously.

• http://kr.github.com/beanstalkd/

• http://beanstalk.rubyforge.org/

• Facebook’s open source project

Page 53: Distributed Ruby and Rails

Why we need asynchronous/ background-processing in Rails?

• cron-like processing (compute daily statistics data, create reports, Full-text search index update etc)

• long-running tasks (sending mail, resizing photo’s, encoding videos, generate PDF, image upload to S3, posting something to twitter etc)

• Server traffic jam: expensive request will block server resources(i.e. your Rails app)

• Bad user experience: they maybe try to reload and reload again! (responsive matters)

Page 54: Distributed Ruby and Rails

3.Background-processing for Rails

• script/runner

• rake

• cron

• daemon

• run_later plugin

• spawn plugin

Page 55: Distributed Ruby and Rails

script/runner

• In Your Rails App root:

• script/runner “Worker.process”

Page 56: Distributed Ruby and Rails

rake

• In RAILS_ROOT/lib/tasks/dev.rake

• rake dev:process

namespace :dev do task :process do #... end end

Page 57: Distributed Ruby and Rails

cron

• Cron is a time-based job scheduler in Unix-like computer operating systems.

• crontab -e

Page 58: Distributed Ruby and Rails

Wheneverhttp://github.com/javan/whenever

• A Ruby DSL for Defining Cron Jobs

• http://asciicasts.com/episodes/164-cron-in-ruby

• or http://cronedit.rubyforge.org/

every 3.hours do runner "MyModel.some_process" rake "my:rake:task" command "/usr/bin/my_great_command" end

Page 59: Distributed Ruby and Rails

Daemon

• http://daemons.rubyforge.org/

• http://github.com/dougal/daemon_generator/

Page 60: Distributed Ruby and Rails

rufus-schedulerhttp://github.com/jmettraux/rufus-scheduler

• scheduling pieces of code (jobs)

• Not replacement for cron/at since it runs inside of Ruby.

require 'rubygems'require 'rufus/scheduler'

scheduler = Rufus::Scheduler.start_new

scheduler.every '5s' do puts 'check blood pressure'end

scheduler.join

Page 61: Distributed Ruby and Rails

Daemon Kithttp://github.com/kennethkalmer/daemon-kit

• Creating Ruby daemons by providing a sound application skeleton (through a generator), task specific generators (jabber bot, etc) and robust environment management code.

Page 62: Distributed Ruby and Rails

Monitor your daemon

• http://mmonit.com/monit/

• http://github.com/arya/bluepill

• http://god.rubyforge.org/

Page 63: Distributed Ruby and Rails

daemon_controllerhttp://github.com/FooBarWidget/daemon_controller

• A library for robust daemon management

• Make daemon-dependent applications Just Work without having to start the daemons manually.

Page 64: Distributed Ruby and Rails

off-load task via system command

# mailings_controller.rbdef deliver call_rake :send_mailing, :mailing_id => params[:id].to_i flash[:notice] = "Delivering mailing" redirect_to mailings_urlend

# controllers/application.rbdef call_rake(task, options = {}) options[:rails_env] ||= Rails.env args = options.map { |n, v| "#{n.to_s.upcase}='#{v}'" } system "/usr/bin/rake #{task} #{args.join(' ')} --trace 2>&1 >> #{Rails.root}/log/rake.log &"end

# lib/tasks/mailer.rakedesc "Send mailing"task :send_mailing => :environment do mailing = Mailing.find(ENV["MAILING_ID"]) mailing.deliverend

# models/mailing.rbdef deliver sleep 10 # placeholder for sending email update_attribute(:delivered_at, Time.now)end

Page 65: Distributed Ruby and Rails

Simple Thread

after_filter do Thread.new do AccountMailer.deliver_signup(@user) endend

Page 66: Distributed Ruby and Rails

run_later plugin http://github.com/mattmatt/run_later

• Borrowed from Merb

• Uses worker thread and a queue

• Simple solution for simple tasks

run_later do AccountMailer.deliver_signup(@user)end

Page 67: Distributed Ruby and Rails

spawn pluginhttp://github.com/tra/spawn

spawn do logger.info("I feel sleepy...") sleep 11 logger.info("Time to wake up!") end

Page 68: Distributed Ruby and Rails

spawn (cont.)

• By default, spawn will use the fork to spawn child processes. You can configure it to do threading.

• Works by creating new database connections in ActiveRecord::Base for the spawned block.

• Fock need copy Rails every time

Page 69: Distributed Ruby and Rails

threading vs. forking• Forking advantages:

• more reliable? - the ActiveRecord code is not thread-safe.

• keep running - subprocess can live longer than its parent.

• easier - just works with Rails default settings. Threading requires you set allow_concurrency=true and. Also, beware of automatic reloading of classes in development mode (config.cache_classes = false).

• Threading advantages:

• less filling - threads take less resources... how much less? it depends.

• debugging - you can set breakpoints in your threads

Page 70: Distributed Ruby and Rails

Okay, we need reliable messaging system:

• Persistent

• Scheduling: not necessarily all at the same time

• Scalability: just throw in more instances of your program to speed up processing

• Loosely coupled components that merely ‘talk’ to each other

• Ability to easily replace Ruby with something else for specific tasks

• Easy to debug and monitor

Page 71: Distributed Ruby and Rails

4.Message Queues (for Rails only)

• ar_mailer

• BackgroundDRb

• workling

• delayed_job

• resque

Page 72: Distributed Ruby and Rails

Rails only?

• Easy to use/write code

• Jobs are Ruby classes or objects

• But need to load Rails environment

Page 73: Distributed Ruby and Rails

ar_mailerhttp://seattlerb.rubyforge.org/ar_mailer/

• a two-phase delivery agent for ActionMailer.

• Store messages into the database

• Delivery by a separate process, ar_sendmail later.

Page 74: Distributed Ruby and Rails

BackgroundDRbhttp://backgroundrb.rubyforge.org/

• BackgrounDRb is a Ruby job server and scheduler.

• Have scalability problem due to (~20 servers for Mark Bates)

• Hard to know if processing error

• Use database to persist tasks

• Use memcached to know processing result

Page 75: Distributed Ruby and Rails

worklinghttp://github.com/purzelrakete/workling

• Gives your Rails App a simple API that you can use to make code run in the background, outside of the your request.

• Supports Starling(default), BackgroundJob, Spawn and AMQP/RabbitMQ Runners.

Page 76: Distributed Ruby and Rails

Workling/Starlingsetup

• script/plugin install git://github.com/purzelrae/workling.git

• sudo starling -p 15151

• RAILS_ENV=production script/workling_client start

Page 77: Distributed Ruby and Rails

Workling exampleclass EmailWorker < Workling::Base def deliver(options) user = User.find(options[:id]) user.deliver_activation_email endend

# in your controllerdef create EmailWorker.asynch_deliver( :id => 1)end

Page 78: Distributed Ruby and Rails

delayed_job

• Database backed asynchronous priority queue

• Extracted from Shopify

• you can place any Ruby object on its queue as arguments

• Only load the Rails environment only once

Page 79: Distributed Ruby and Rails

delayed_job setup(use fork version)

• script/plugin install git://github.com/collectiveidea/delayed_job.git

• script/generate delayed_job

• rake db:migrate

Page 80: Distributed Ruby and Rails

delayed_job examplesend_later

def deliver mailing = Mailing.find(params[:id]) mailing.send_later(:deliver) flash[:notice] = "Mailing is being delivered." redirect_to mailings_url end

Page 81: Distributed Ruby and Rails

delayed_job examplecustom workers

class MailingJob < Struct.new(:mailing_id)

def perform mailing = Mailing.find(mailing_id) mailing.deliver end

end

# in your controllerdef deliver Delayed::Job.enqueue(MailingJob.new(params[:id])) flash[:notice] = "Mailing is being delivered." redirect_to mailings_url end

Page 82: Distributed Ruby and Rails

delayed_job examplealways asynchronously

class Device def deliver # long running method end handle_asynchronously :deliverend

device = Device.newdevice.deliver

Page 83: Distributed Ruby and Rails

Running jobs

• rake jobs:works(Don’t use in production, it will exit if the database has any network connectivity problems.)

• RAILS_ENV=production script/delayed_job start

• RAILS_ENV=production script/delayed_job stop

Page 84: Distributed Ruby and Rails

Priorityjust Integer, default is 0

Delayed::Job.enqueue(MailingJob.new(params[:id]), 3)

Delayed::Job.enqueue(MailingJob.new(params[:id]), -3)

• you can run multipie workers to handle different priority jobs

• RAILS_ENV=production script/delayed_job -min-priority 3 start

Page 85: Distributed Ruby and Rails

Scheduledno guarantees at precise time, just run_after_at

Delayed::Job.enqueue(MailingJob.new(params[:id]), 3, 3.days.from_now)

Delayed::Job.enqueue(MailingJob.new(params[:id]), 3, 1.month.from_now.beginning_of_month)

Page 86: Distributed Ruby and Rails

Configuring Dealyed Job

# config/initializers/delayed_job_config.rbDelayed::Worker.destroy_failed_jobs = falseDelayed::Worker.sleep_delay = 5 # sleep if empty queueDelayed::Worker.max_attempts = 25Delayed::Worker.max_run_time = 4.hours # set to the amount of time of longest task will take

Page 87: Distributed Ruby and Rails

Automatic retry on failure

• If a method throws an exception it will be caught and the method rerun later.

• The method will be retried up to 25(default) times at increasingly longer intervals until it passes.

• 108 hours at mostJob.db_time_now + (job.attempts ** 4) + 5

Page 88: Distributed Ruby and Rails

Capistrano Recipes

• Remember to restart delayed_job after deployment

• Check out lib/delayed_job/recipes.rb

after "deploy:stop", "delayed_job:stop"after "deploy:start", "delayed_job:start"after "deploy:restart", "delayed_job:restart"

Page 89: Distributed Ruby and Rails

Resquehttp://github.com/defunkt/resque

• a Redis-backed library for creating background jobs, placing those jobs on multiple queues, and processing them later.

• Github’s open source project

• you can only place JSONable Ruby objects

• includes a Sinatra app for monitoring what's going on

• support multiple queues

• you expect a lot of failure/chaos

Page 90: Distributed Ruby and Rails

My recommendations:

• General purpose: delayed_job(Github highly recommend DelayedJob to anyone whose site is not 50% background work.)

• Time-scheduled: cron + rake

Page 91: Distributed Ruby and Rails

5. SOA for Rails

• What’s SOA

• Why SOA

• Considerations

• The tool set

Page 92: Distributed Ruby and Rails

What’s SOAService oriented architectures

• “monolithic” approach is not enough

• SOA is a way to design complex applications by splitting out major components into individual services and communicating via APIs.

• a service is a vertical slice of functionality: database, application code and caching layer

Page 93: Distributed Ruby and Rails

a monolithic web app example

WebApps

Database

Load Balancer

request

Page 94: Distributed Ruby and Rails

a SOA example

Database

Services A

WebAppsfor User

Load Balancer

Services B

Database

WebAppfor Administration

request

request

Page 95: Distributed Ruby and Rails

Why SOA? Isolation

• Shared Resources

• Encapsulation

• Scalability

• Interoperability

• Reuse

• Testability

• Reduce Local Complexity

Page 96: Distributed Ruby and Rails

Shared Resources• Different front-web website use the same

resource.

• SOA help you avoiding duplication databases and code.

• Why not only shared database?

• code is not DRY

• caching will be problematic

WebAppsfor User

WebAppfor Administration

Database

Page 97: Distributed Ruby and Rails

Encapsulation

• you can change underly implementation in services without affect other parts of system

• upgrade library

• upgrade to Ruby 1.9

• you can provide API versioning

Page 98: Distributed Ruby and Rails

Scalability1: Partitioned Data Provides

• Database is the first bottleneck, a single DB server can not scale. SOA help you reduce database load

• Anti-pattern: only split the database

• model relationship is broken

• referential integrity

• Myth: database replication can not help you speed and consistency

WebApps

Database A

Database B

Page 99: Distributed Ruby and Rails

Scalability 2: Caching

• SOA help you design caching system easier

• Cache data at the right times and expire at the right times

• Cache logical model, not physical

• You do not need cache view everywhere

Page 100: Distributed Ruby and Rails

Scalability 3: Efficient

• Different components have different task loading, SOA can scale by service.

Load Balancer

Services A Services A

Load Balancer

Services B Services B Services B

WebApps

Services B

Page 101: Distributed Ruby and Rails

Security

• Different services can be inside different firewall

• You can only open public web and services, others are inside firewall.

Page 102: Distributed Ruby and Rails

Interoperability

• HTTP is the common interface, SOA help you integrate them:

• Multiple languages

• Internal system e.g. Full-text searching engine

• Legacy database, system

• External vendors

Page 103: Distributed Ruby and Rails

Reuse

• Reuse across multiple applications

• Reuse for public APIs

• Example: Amazon Web Services (AWS)

Page 104: Distributed Ruby and Rails

Testability

• Isolate problem

• Mocking API calls

• Reduce the time to run test suite

Page 105: Distributed Ruby and Rails

Reduce Local Complexity

• Team modularity along the same module splits as your software

• Understandability: The amount of code is minimized to a quantity understandable by a small team

• Source code control

Page 106: Distributed Ruby and Rails

Considerations

• Partition into Separate Services

• API Design

• Which Protocol

Page 107: Distributed Ruby and Rails

How to partition into Separate Services

• Partitioning on Logical Function

• Partitioning on Read/Write Frequencies

• Partitioning by Minimizing Joins

• Partitioning by Iteration Speed

Page 108: Distributed Ruby and Rails

API Design

• Send Everything you need

• Parallel HTTP requests

• Send as Little as Possible

• Use Logical Models

Page 109: Distributed Ruby and Rails

Physical Models & Logical Models

• Physical models are mapped to database tables through ORM. (It’s 3NF)

• Logical models are mapped to your business problem. (External API use it)

• Logical models are mapped to physical models by you.

Page 110: Distributed Ruby and Rails

Logical Models

• Not relational or normalized

• Maintainability

• can change with no change to data store

• can stay the same while the data store changes

• Better fit for REST interfaces

• Better caching

Page 111: Distributed Ruby and Rails

Which Protocol?

• SOAP

• XML-RPC

• REST

Page 112: Distributed Ruby and Rails

RESTful Web services

• Rails way

• REST is about resources

• URL

• Verbs: GET/PUT/POST/DELETE

Page 113: Distributed Ruby and Rails

The tool set

• Web framework

• XML Parser

• JSON Parser

• HTTP Client

Page 114: Distributed Ruby and Rails

Web framework

• We do not need controller, view too much

• Rails is a little more, how about Sinatra?

• Rails metal

Page 115: Distributed Ruby and Rails

ActiveResource

• Mapping RESTful resources as models in a Rails application.

• But not useful in practice, why?

Page 116: Distributed Ruby and Rails

XML parser

• http://nokogiri.org/

• Nokogiri (鋸) is an HTML, XML, SAX, and Reader parser. Among Nokogiri’s many features is the ability to search documents via XPath or CSS3 selectors.

Page 117: Distributed Ruby and Rails

JSON Parser

• http://github.com/brianmario/yajl-ruby/

• An extremely efficient streaming JSON parsing and encoding library. Ruby C bindings to Yajl

Page 118: Distributed Ruby and Rails

HTTP Client

• http://github.com/pauldix/typhoeus/

• Typhoeus runs HTTP requests in parallel while cleanly encapsulating handling logic

Page 119: Distributed Ruby and Rails

Tips

• Define your logical model (i.e. your service request result) first.

• model.to_json and model.to_xml is easy to use, but not useful in practice.

Page 120: Distributed Ruby and Rails

6.Distributed File System• NFS not scale

• we can use rsync to duplicate

• MogileFS

• http://www.danga.com/mogilefs/

• http://seattlerb.rubyforge.org/mogilefs-client/

• Amazon S3

• HDFS (Hadoop Distributed File System)

• GlusterFS

Page 121: Distributed Ruby and Rails

7.Distributed Database

• NoSQL

• CAP theorem

• Eventually consistent

• HBase/Cassandra/Voldemort

Page 122: Distributed Ruby and Rails

The End感謝聆聽

Page 123: Distributed Ruby and Rails

References• Books&Articles:

• Distributed Programming with Ruby, Mark Bates (Addison Wesley)

• Enterprise Rails, Dan Chak (O’Reilly)

• Service-Oriented Design with Ruby and Rails, Paul Dix (Addison Wesley)

• RESTful Web Services, Richardson&Ruby (O’Reilly)

• RESTful WEb Services Cookbook, Allamaraju&Amundsen (O’Reilly)

• Enterprise Recipes with Ruby on Rails, Maik Schmidt (The Pragmatic Programmers)

• Ruby in Practice, McAnally&Arkin (Manning)

• Building Scalable Web Sites, Cal Henderson (O’Reilly)

• Background Processing in Rails, Erik Andrejko (Rails Magazine)

• Background Processing with Delayed_Job, James Harrison (Rails Magazine)

• Bulinging Scalable Web Sites, Cal Henderson (O’Reilly)

• 建构高性能Web站点,郭欣 (電子工業出版社)

• Slides:

• Background Processing (Rob Mack) Austin on Rails - April 2009

• The Current State of Asynchronous Processing in Ruby (Mathias Meyer, Peritor GmbH)

• Asynchronous Processing (Jonathan Dahl)

• Long-Running Tasks In Rails Without Much Effort (Andy Stewart) - April 2008

• Starling + Workling: simple distributed background jobs with Twitter’s queuing system, Rany Keddo 2008

• Physical Models & Logical Models in Rails, dan chak

Page 124: Distributed Ruby and Rails

References• Links:

• http://segment7.net/projects/ruby/drb/

• http://www.slideshare.net/luccastera/concurrent-programming-with-ruby-and-tuple-spaces

• http://github.com/blog/542-introducing-resque

• http://www.engineyard.com/blog/2009/5-tips-for-deploying-background-jobs/

• http://www.opensourcery.co.za/2008/07/07/messaging-and-ruby-part-1-the-big-picture/

• http://leemoonsoo.blogspot.com/2009/04/simple-comparison-open-source.html

• http://blog.gslin.org/archives/2009/07/25/2065/

• http://www.javaeye.com/topic/524977

• http://www.allthingsdistributed.com/2008/12/eventually_consistent.html

Page 125: Distributed Ruby and Rails

Todo (maybe next time)

• AMQP/RabbitMQ example code

• How about Nanite?

• XMPP

• MagLev VM

• More MapReduce example code

• How about Amazon Elastic MapReduce?

• Resque example code

• More SOA example and code

• MogileFS example code