strata software architecture ny: the data dichotomy

The Data Dichotomy: Rethinking Data and Services

with Streams Ben Stopford @benstopford

What are microservices really about?

GUI

UI Service

Orders Service

Returns Service

Fulfilment Service

Payment Service

Stock Service

Splitting the Monolith? Single Process /Code base

/Deployment

Many Processes /Code bases

/Deployments

Orders Service

Autonomy?

Orders Service

Stock Service

Email Service

Fulfillment Service

Independently Deployable




Independence is where services get

their value

Allows Scaling

Scaling in terms of people

What happens when we grow?

Companies are inevitably a collection of applications They must work together to some degree

Interconnection is an afterthought

FTP / Enterprise Messaging etc

Internal World

External world

External world

External World

External world

Service Boundary

Services Force Us To Consider The External World

Internal World

External world

External world

External World

External world

Service Boundary

External World is something we should Design For

But this independence

comes at a cost $$$

Orders Object

Statement Object getOpenOrders(),

Less Surface Area = Less Coupling

Encapsulate State

Encapsulate Functionality

Consider Two Objects in one address space

Encapsulation => Loose Coupling

Change 1 Change 2

Redeploy

Singly-deployable apps are easy

Orders Service Statement Service getOpenOrders(),

Independently Deployable Independently Deployable

Synchronized changes are painful

Services work best where requirements are isolated in a single

bounded context

Single Sign On Business Service authorise(),

It’s unlikely that a Business Service would need the internal SSO state / function to change

Single Sign On

SSO has a tightly bounded context

But business services are

different

Catalog Authorisation

Most business services share the same core stream of facts.

Most services live in here

The futures of business services

are far more tightly intertwined.

We need encapsulation to hide internal state. Be loosely coupled.

But we need the freedom to slice & dice shared data like any other

dataset

But data systems have little to do

with encapsulation

Service Database

Data on inside

Data on outside

Data on inside

Data on outside

Interface hides data

Interface amplifies

data

Databases amplify the data they hold

The data dichotomy Data systems are about exposing data.

Services are about hiding it.

Service Database

Data on inside

Data on outside

Data on inside

Data on outside

Encapsulation protects us

against future change.

Databases provide ultimate flexibility for the

now

These approaches serve different goals

This affects services in one of

two ways

getOpenOrders( fulfilled=false,

deliveryLocation=CA, orderValue=100,

operator=GreatherThan)

Either (1) we constantly add to the interface, as datasets grow

getOrder(Id)

getOrder(UserId)

getAllOpenOrders()

getAllOrdersUnfulfilled(Id)

getAllOrders()

(1) Services can end up looking like kookie home-grown databases

...and DATA amplifies this “God Service”

problem

(2) Give up and move whole datasets en masse

getAllOrders()

Data Copied Internally

This leads to a different problem

Data diverges over time

Orders Service

Stock Service

Email Service

Fulfill- ment Service

The more mutable copies,

the more data will diverge over time

Nice neat services

Service contract too

limiting

Can we change both services

together, easily? NO

Broaden Contract

Eek $$$

NO

YES

Is it a shared

Database?

Frack it! Just give me ALL the data

Data diverges. (many different

versions of the same facts) Lets encapsulate

Lets centralise to one copy

YES

Start here

Cycle of inadequacy:

These forces compete in the systems we build

Divergence

Accessibility

Encapsulation

Shared database

Service Interfaces

Better Accessibility

Better Independence

No perfect solution

Is there a better way?

Data on the Inside

Data on the Outside

Data on the Outside

Data on the Outside

Data on the Outside

Service Boundary

Make data-on-the-outside a 1st class citizen

Separate Reads & Writes with Immutable Streams

Orders Service

Stock Service

Kafka

Fulfilment Service

UI Service

Fraud Service

Event Driven + CQRS + Stateful Stream Processing

Commands go to owning services

Queries are interpreted entirely inside in each

bounded context

Event Driven + CQRS + Stateful Stream Processing

Orders

Service

Stock

Service

Orders

Fulfilment

Service

UI

Service

Fraud

Service

•  Events form a central Shared Narrative

•  Writes/Commands => Go to “owning” service

•  Reads/Queries => Wholly owned by each Service

•  Stream processing toolkit

Kafka is a Streaming Platform

Kafka: a Streaming Platform

The Log Connectors Connectors

Producer Consumer

Streaming Engine

The Log

Kafka is a type of messaging system

Writers Kafka Readers

Important Properties: •  Petabytes of data •  Scales out linearly •  Built in fault tolerance •  Log can be replayed

Scaling becomes a concern of the broker, not the

service

Services Naturally Load Balance

This provides Fault Tolerance

Reset to any point in the shared narrative

Rewind & Replay

Compacted Log (retains only latest version)

Version 3

Version 2

Version 1

Version 2

Version 1

Version 5

Version 4

Version 3

Version 2

Version 1

Service Backbone Scalable, Fault Tolerant, Concurrent, Strongly Ordered, Retentive


Producer Consumer

Streaming Engine

A place to keep the data-on-the-outside as an immutable

narrative

Now add Stream Processing

Kafka’s Streaming API A general, embeddable streaming engine


Producer Consumer

Streaming Engine

What is Stream Processing?

A machine for combining and processing streams of events

Features: similar to database query engine

Join Filter Aggr- egate

View

Window

Stateful Stream Processing adds Tables & Local State

stream

Compacted stream

Join

Stream Data

Stream-Tabular Data

Domain Logic

Tables /

Views

Kafka

Event Driven Service

Avg(o.time – p.time) From orders, payment, user Group by user.region over 1 day window emitting every second

Stateful Stream Processing

Streams

Stream Processing Engine

Derived “Table”

Table Domain Logic

Tools make it easy to join streams of events (or tables of data) from

different services

Email Service

Legacy App Orders Payments Stock

KSTREAMS

Kafka

So we might join two event streams and create a local

store in our own Domain Model

Orders Service

Payment Events

Purchase Events

PurchasePayments

Make the projections queryable

Orders Service

Kafka

Queryable State API

Context Boundary

Blend the concepts of services communicating and services sharing state

customer orders

catalogue Pay-

ments

Very different to sharing a database

customer orders

catalogue Pay-

ments

Services communicate and share data via the log

Slicing & Dicing is embedded in each bounded context

But sometimes you have to move data

Easy to drop out to an external database if needed

DB of choice

Kafka Connect

API Fraud Service

Context Boundary

Oracle

Connector API & Open Source Ecosystem make this easier

The Log

Connector Connector

Producer Consumer

Streaming Engine

Cassandra

WIRED Principals •  Wimpy: Start simple, lightweight and fault tolerant.

•  Immutable: Build a retentive, shared narrative.

•  Reactive: Leverage Asynchronicity. Focus on the now.

•  Evolutionary: Use only the data you need today.

•  Decentralized: Receiver driven. Coordination avoiding. No

God services

Architecture is really about designing for change

The data dichotomy Data systems are about exposing data.

Services are about hiding it.

Remember:

Shared data is a reality for most organisations

Catalog

Embrace data that lives and flows between services

Canonical Immutable Streams

Embed the ability to slice and dice into each bounded context

Distributed Log Event Driven

Service

Shared database

Service Interfaces

Better Accessibility

Better Independence

Balance the Data Dichotomy

Create a canonical shared narrative

Use stream processing to handle data-on-the-

outside whilst prioritizing the NOW

Event Driven Services

The Log (streams & tables)

Ingestion Services

Services with Polyglotic

persistence

Simple Services

Streaming Services

Twitter: @benstopford

Blog of this talk at http://confluent.io/blog

strata software architecture ny: the data dichotomy

Technology