slash n: tech talk track 2 – distributed transactions in soa - yogi kulkarni, ganesh subramanian
DESCRIPTION
TRANSCRIPT
Distributed Transactions in SOA Dealing with data consistency issues
Yogi Kulkarni
Ganesh Subramanian
• Experiences from recent project to rebuild Flipkart's Supply
Chain Platform as a SOA
• 20 new services
o Separate databases, data-access only through public API
o HTTP + JSON API
Context
Order
Management
HTTP + JSON API
Fulfillment
HTTP + JSON API
Procurement
HTTP + JSON API
Warehouse
HTTP + JSON API
Doc
HTTP + JSON API
Supplier
HTTP + JSON API
Shipping
HTTP + JSON API
Accounting
HTTP + JSON API
Flipkart Supply Chain: High Level View
• Strict data consistency requirements
o Cannot afford to lose Orders, Payments,
Invoices, Inventory, Shipments, etc
• Business-transactions involving multi-service
updates
• Have to deal with "partial failures"
Challenges
Failure Scenario 1
Other service "disappears" after it was asked to do something
• Retry?
• What if B had already processed the request?
do_Y
do_X
Service
A
Service
B
?
• B crashed
• Network error
Failure Scenario 2
Caller crashed after the other service did what it was told to do
Caller crashed. On restarting:
- Should it ask B to undo what it did?
- Should it resume from where it crashed?
do_Y
do_X
Service
A
Service
B
"Transaction" concept?
Distributed Transactions to the Rescue
Types of distributed transactions
- XA transactions using 2-phase commit
- Stateless Queued transactions*
- Stateful Queued transactions*
* From: Transactional Information Systems - Weikum & Vossen http://books.google.co.in/books?id=wV5Ran71zNoC
Keeping data consistent in the face of highly concurrent
access and despite all sorts of failures*
XA Transactions
Distributed
Transaction
Coordinator
2-phase commit
Application
DB DB
Message
Queue
Resource
Manager
Resource
Manager Resource Manager
XA Transactions
Pros
o Simple programming model for caller
Cons
o Breaks service encapsulation if underlying DB participates in
the transaction
o Performance issues due to lock hold times
o Can block when coordinator fails
o Service-level 2 PC not standardized or mature e.g. WS-
AtomicTransaction Specification from OASIS Group
Stateless Queued Transactions
A B
DB DB
Transaction Scope:
DB Commit +
enqueue message
1
Transaction Scope:
Dequeue message +
DB Commit
2
Worker Worker
Stateless Queued Transactions
Pros
o Works well for simple request-reply interactions
Cons
o Difficult to deal with complex multi-step transactions
o Performance issues if XA transaction is used between DB &
MQ
o Idempotency checks need to be implemented to prevent
duplicate processing
o Message ordering problem
o 2 "interfaces" - Messages and main HTTP API
Stateful Queued Transactions
Image from: Transactional Information Systems - Weikum & Vossen
Stateful Queued Transactions
Requires a state-machine / workflow-like engine
Pros
o Simplifies implementation of stateful processes spanning multiple services
o Convenient for flows which fork & join
Cons
o State-engine is complex to build
o Open source and commercial tools not a good fit
o BPM, Workflow, ESB tools are integration focussed
o Heavyweight
What we did
• Abstract out messaging
• Services expose a single HTTP interface for sync and async
• Local transactions & relayer to avoid XA issues
• Idempotency check infrastructure
Make it very simple for a service to participate in asynchronous interactions under such transactional guarantees
Most interactions were one-hop request-reply, so chose Stateless Queued Transactions
Service A Service B Service C
Restbus Server
Restbus Client Restbus Client Restbus Client
MQ
Restbus
Application Business Logic Restbus
Filter Relayer
Database inbound_messages outbound_messages
Restbus Server
Service B
POST /some_resource/:id
{"data":"some data"}
Don't see restbus headers.
Forwarding to App
POST /some_resource/:id
X_RESTBUS_MESSAGE_ID: IDxyz1234
{"data":"some data"}
Msg id `IDxyz1234`was not
processed before.
Adding it to inbound_messages
I have to do an update on Service C !!
Let me write to outbound_messages
Msg id `IDxyz1234`was already
processed.
Sending the stored response back
Found new messages to be relayed.
Order them by group id and send in
sequence to restbus
Restbus was not reachable. I will
retry processing the rest of the
messages in the group later
Sending the stored response back
Restbus Client
Service A
Instance 3 Instance 2
Instance 1
MQ
MQ Connector
HTTP Connector
C1 C2 C4 C3
Msg Handler
Service A Service B Service C
C5 C6
Push the message to
retry queue
Restbus Server
Restbus
MQ
Order
Consumer Payment
Consumer
Order
Management Accounting
Idempotency
Filter Idempotency
Filter
Relayer Relayer
Order DB Payments DB
Inbound
Messages
Inbound
Messages
Outbound
Messages
Outbound
Messages
Send
Message Poll
Messages
Poll
Messages
Insert Message Insert Message
Insert Payment
POST /callback
(Payment Created)
HTTP
POST /payments
HTTP
Get messages to
be relayed
Begin Txn
Insert Order
Insert create
payment
Message
Begin Txn
Push
response to
queue
Commit Txn
Push Messages
HTTP
Commit Txn
Approve
Order
Restbus Features
o Pluggable MQ
o End-to-end group-based message ordering
o Dynamic load balancing of consumers in a cluster
o Multi-level retries and sideline queues
Restbus - Tech Stack
o Netty-based Java server
o JMS - Swiftmq and HornetQ
o Zookeeper
Restbus at Flipkart
o Live in production for past 4 months
o Backbone of Flipkart's supply chain system
o 20 services, 200 queues, 30 topics
o 3 million messages & events a day
o Resiliency because of message-persistence, relay, auto-retries, idempotency
o Monitoring is critical to deal with increased complexity of
async operations
o Service-responsibility paritioning is important to prevent read-
then-write race conditions - "tell-don't-ask"
o Synchronous inter-service update calls == lots of partial
failures
Learnings
o "Orchestrator" services get complex fast
o HTTP is excellent integration glue
o Unified interface
o HTTP semantics directly usable