failure self defense: defend your app against failures in a (micro) services world
TRANSCRIPT
module Api
class Products
def initialize(host = nil)
@client =
HTTPClient.new(host)
end
def all
@client.get('/products')
end
end
end
def send_checkout
params = { email: @email, token: @token,
ssl_version: :SSLv3 }
RestClient.post(checkout_url, checkout_xml,
params: params,
content_type: "application/xml"){|resp, request, result|
resp }
end
Products
Orders
Payments
Catalog
Checkout
Pack
Route
Authorize
Charge
Degraded
Down
Up
Feature Service
Products
Orders
Payments
Catalog
Checkout
Pack
Route
Authorize
Charge
Degraded
Down
Up
Feature Service
Products
Orders
Payments
Catalog
Checkout
Pack
Route
Authorize
Charge
Degraded
Down
Up
Feature Service
context 'when service UP' do
before { Cache.put('key', 'value') }
it 'saves value' do
expect(Cache.get('key')).to eq('value')
end
end
context 'when service DOWN' do
it 'will raises error' do
Toxiproxy[:redis].down do
expect { Cache.put('key', 'value') }.to
raise_error(Redis::CannotConnectError)
end
end
end
context 'when service UP' do
before { Cache.put('key', 'value') }
it 'saves value' do
expect(Cache.get('key')).to eq('value')
end
end
context 'when service DOWN' do
it 'will raises error' do
Toxiproxy[:redis].down do
expect { Cache.put('key', 'value') }.to
raise_error(Redis::CannotConnectError)
end
end
end
An application with an average Response Time of 60ms can process 1.000 Requests Per Minute (RPM) per Thread.
An application with an average Response Time of 60ms can process 1.000 Requests Per Minute (RPM) per Thread.
How many Threads we need to handle 100.000 RPM of Throughput ?
Imagine that 1% of the traffic timeout on a Service after 30 seconds, the Response Time will raise to 360 ms.
Imagine that 1% of the traffic timeout on a Service after 30 seconds, the Response Time will raise to 360 ms.
How many Threads we need to handle 100.000 RPM of Throughput ?
class Cache
def self.put(key, value)
service.set(key, value)
end
def self.get(key)
service.get(key)
end
end
end
class Cache
def self.put(key, value)
service.set(key, value)
end
def self.get(key)
service.get(key)
end
end
end
Cache.put('key', 'value')
Cache.get('key
')
def put(key, value)
service.set(key, value)
true
rescue Redis::CannotConnectError => error
AwesomeLogger.log(error)
false
end
def get(key, fallback_value = nil)
service.get(key)
rescue Redis::CannotConnectError => error
AwesomeLogger.log(error)
fallback_value
end
end
Monitor Service Calls
Timeout rate Rejected call rate
Short circuit rate Failure/Success rate
Response Times
SummaryKnow your dependenciesImprove your test suite
Fail FastTimeouts
Fail GracefullyFallbacks
Don't try if you can't succeedCircuit Breakers and Bulkheads are friends
Monitor Service Calls
Notice problems