strata software architecture ny: the data dichotomy
TRANSCRIPT
The Data Dichotomy: Rethinking Data and Services
with Streams Ben Stopford @benstopford
What are microservices really about?
GUI
UI Service
Orders Service
Returns Service
Fulfilment Service
Payment Service
Stock Service
Splitting the Monolith? Single Process /Code base
/Deployment
Many Processes /Code bases
/Deployments
Orders Service
Autonomy?
Orders Service
Stock Service
Email Service
Fulfillment Service
Independently Deployable
Independently Deployable
Independently Deployable
Independently Deployable
Independence is where services get
their value
Allows Scaling
Scaling in terms of people
What happens when we grow?
Companies are inevitably a collection of applications They must work together to some degree
Interconnection is an afterthought
FTP / Enterprise Messaging etc
Internal World
External world
External world
External World
External world
Service Boundary
Services Force Us To Consider The External World
Internal World
External world
External world
External World
External world
Service Boundary
External World is something we should Design For
But this independence
comes at a cost $$$
Orders Object
Statement Object getOpenOrders(),
Less Surface Area = Less Coupling
Encapsulate State
Encapsulate Functionality
Consider Two Objects in one address space
Encapsulation => Loose Coupling
Change 1 Change 2
Redeploy
Singly-deployable apps are easy
Orders Service Statement Service getOpenOrders(),
Independently Deployable Independently Deployable
Synchronized changes are painful
Services work best where requirements are isolated in a single
bounded context
Single Sign On Business Service authorise(),
It’s unlikely that a Business Service would need the internal SSO state / function to change
Single Sign On
SSO has a tightly bounded context
But business services are
different
Catalog Authorisation
Most business services share the same core stream of facts.
Most services live in here
The futures of business services
are far more tightly intertwined.
We need encapsulation to hide internal state. Be loosely coupled.
But we need the freedom to slice & dice shared data like any other
dataset
But data systems have little to do
with encapsulation
Service Database
Data on inside
Data on outside
Data on inside
Data on outside
Interface hides data
Interface amplifies
data
Databases amplify the data they hold
The data dichotomy Data systems are about exposing data.
Services are about hiding it.
Service Database
Data on inside
Data on outside
Data on inside
Data on outside
Encapsulation protects us
against future change.
Databases provide ultimate flexibility for the
now
These approaches serve different goals
This affects services in one of
two ways
getOpenOrders( fulfilled=false,
deliveryLocation=CA, orderValue=100,
operator=GreatherThan)
Either (1) we constantly add to the interface, as datasets grow
getOrder(Id)
getOrder(UserId)
getAllOpenOrders()
getAllOrdersUnfulfilled(Id)
getAllOrders()
(1) Services can end up looking like kookie home-grown databases
...and DATA amplifies this “God Service”
problem
(2) Give up and move whole datasets en masse
getAllOrders()
Data Copied Internally
This leads to a different problem
Data diverges over time
Orders Service
Stock Service
Email Service
Fulfill- ment Service
The more mutable copies,
the more data will diverge over time
Nice neat services
Service contract too
limiting
Can we change both services
together, easily? NO
Broaden Contract
Eek $$$
NO
YES
Is it a shared
Database?
Frack it! Just give me ALL the data
Data diverges. (many different
versions of the same facts) Lets encapsulate
Lets centralise to one copy
YES
Start here
Cycle of inadequacy:
These forces compete in the systems we build
Divergence
Accessibility
Encapsulation
Shared database
Service Interfaces
Better Accessibility
Better Independence
No perfect solution
Is there a better way?
Data on the Inside
Data on the Outside
Data on the Outside
Data on the Outside
Data on the Outside
Service Boundary
Make data-on-the-outside a 1st class citizen
Separate Reads & Writes with Immutable Streams
Orders Service
Stock Service
Kafka
Fulfilment Service
UI Service
Fraud Service
Event Driven + CQRS + Stateful Stream Processing
Commands go to owning services
Queries are interpreted entirely inside in each
bounded context
Event Driven + CQRS + Stateful Stream Processing
Orders
Service
Stock
Service
Orders
Fulfilment
Service
UI
Service
Fraud
Service
• Events form a central Shared Narrative
• Writes/Commands => Go to “owning” service
• Reads/Queries => Wholly owned by each Service
• Stream processing toolkit
Kafka is a Streaming Platform
Kafka: a Streaming Platform
The Log Connectors Connectors
Producer Consumer
Streaming Engine
The Log
Kafka is a type of messaging system
Writers Kafka Readers
Important Properties: • Petabytes of data • Scales out linearly • Built in fault tolerance • Log can be replayed
Scaling becomes a concern of the broker, not the
service
Services Naturally Load Balance
This provides Fault Tolerance
Reset to any point in the shared narrative
Rewind & Replay
Compacted Log (retains only latest version)
Version 3
Version 2
Version 1
Version 2
Version 1
Version 5
Version 4
Version 3
Version 2
Version 1
Service Backbone Scalable, Fault Tolerant, Concurrent, Strongly Ordered, Retentive
The Log Connectors Connectors
Producer Consumer
Streaming Engine
A place to keep the data-on-the-outside as an immutable
narrative
Now add Stream Processing
Kafka’s Streaming API A general, embeddable streaming engine
The Log Connectors Connectors
Producer Consumer
Streaming Engine
What is Stream Processing?
A machine for combining and processing streams of events
Features: similar to database query engine
Join Filter Aggr- egate
View
Window
Stateful Stream Processing adds Tables & Local State
stream
Compacted stream
Join
Stream Data
Stream-Tabular Data
Domain Logic
Tables /
Views
Kafka
Event Driven Service
Avg(o.time – p.time) From orders, payment, user Group by user.region over 1 day window emitting every second
Stateful Stream Processing
Streams
Stream Processing Engine
Derived “Table”
Table Domain Logic
Tools make it easy to join streams of events (or tables of data) from
different services
Email Service
Legacy App Orders Payments Stock
KSTREAMS
Kafka
So we might join two event streams and create a local
store in our own Domain Model
Orders Service
Payment Events
Purchase Events
PurchasePayments
Make the projections queryable
Orders Service
Kafka
Queryable State API
Context Boundary
Blend the concepts of services communicating and services sharing state
customer orders
catalogue Pay-
ments
Very different to sharing a database
customer orders
catalogue Pay-
ments
Services communicate and share data via the log
Slicing & Dicing is embedded in each bounded context
But sometimes you have to move data
Easy to drop out to an external database if needed
DB of choice
Kafka Connect
API Fraud Service
Context Boundary
Oracle
Connector API & Open Source Ecosystem make this easier
The Log
Connector Connector
Producer Consumer
Streaming Engine
Cassandra
WIRED Principals • Wimpy: Start simple, lightweight and fault tolerant.
• Immutable: Build a retentive, shared narrative.
• Reactive: Leverage Asynchronicity. Focus on the now.
• Evolutionary: Use only the data you need today.
• Decentralized: Receiver driven. Coordination avoiding. No
God services
So…
Architecture is really about designing for change
The data dichotomy Data systems are about exposing data.
Services are about hiding it.
Remember:
Shared data is a reality for most organisations
Catalog
Embrace data that lives and flows between services
Canonical Immutable Streams
Embed the ability to slice and dice into each bounded context
Distributed Log Event Driven
Service
Shared database
Service Interfaces
Better Accessibility
Better Independence
Balance the Data Dichotomy
Create a canonical shared narrative
Use stream processing to handle data-on-the-
outside whilst prioritizing the NOW
Event Driven Services
The Log (streams & tables)
Ingestion Services
Services with Polyglotic
persistence
Simple Services
Streaming Services
Twitter: @benstopford
Blog of this talk at http://confluent.io/blog