microservices in the apache kafka ecosystem
TRANSCRIPT
Microservices in a Streaming World
Ben Stopford, Engineer, Confluent
Event Driven Microservices The toolset: Kafka, KStreams, Connect10 Principals for Streaming ServicesWhat well cover
Microservices in the Kafka Ecosystem
Today were going to talk about two fields which sit somewhat apart. Stream Processing & Business Systems
Services encourage us to share responsibilities. To rely on others. To collaborate and evolve a businesss view of the world over time. A world that is inherently decentralised. Inherently spread across many interconnected systems. Spreading many disparate subsets of our businesss state.
Yet our approach rarely reflects this. We tend to think in centralised ways, we accumulate data. We protect it. We control it. We hoard it.But it does not need to be this way.
GUIUIServiceOrdersServiceReturnsServiceFulfilmentServicePaymentServiceStockServiceMicroservices
So whyServices?
Today were going to talk about two fields which sit somewhat apart. Stream Processing & Business Systems
Services encourage us to share responsibilities. To rely on others. To collaborate and evolve a businesss view of the world over time. A world that is inherently decentralised. Inherently spread across many interconnected systems. Spreading many disparate subsets of our businesss state.
Yet our approach rarely reflects this. We tend to think in centralised ways, we accumulate data. We protect it. We control it. We hoard it.But it does not need to be this way.
The MonolithCan we do reuse, encapsulation?
The MonolithWhat happens when we grow?
Companies are inevitably a collection of applicationsThey must work together to some degree
Inverse Conway Maneuver
The 'Inverse Conway Maneuver' recommends evolving your team and organizational structure to promote your desired architecture. Org StructureSoftware Architecture
Eschew shared, mutable state
Service based approaches separate state
But state must inevitably be shared between services
Use a toolkit that embraces decentralisation
My name is Ben Stopford. I work as an engineer on Apache Kafka. In a previous life I did the most centralised thing you could possibly do. I built a single, central database for a large financial institution. This talk is about exploring the exact opposite approach to this problem. Its about thinking of the world in terms of the evolution and provisioning of state across many systems. Its about embracing decentralisation. Its about thinking in streams.
Some simple patterns of distributed systems
Request / Response
When do we need Request Response?Looking things up
What is in my shopping basket?
Event DrivenAsync / Fire and Forget / Brokered
Businesses are often modeled as a sequence events.
Add to cart
Buy
Make Payment
UpdateStock
EmailConf
PackageItem
Dispatch
Fraud
When do we need Event Driven?
SOA / Microservices
Message Broker
Event BasedRequest/Response
HybridsEvent- BasedRequest/Response
Request ResponseBuyItemsOrderServiceStock ServiceFul-filmentServiceFraudDet-ection
MmmNew IPadREST etcUIService
Event DrivenBuyItemsOrderServiceStock ServiceFul-filmentServiceFraudDet-ectionMessage Broker
MmmNew IPadUIService
HybridBuyItemsOrderServiceStock ServiceFul-filmentServiceFraudDet-ectionMessage Broker
MmmNew IPadREST
UIService
As software engineers we are inevitably affected by the tools we surround ourselves with.
Languages, frameworks, even processes all act to shape the software we build.
GUIUIServiceOrdersServiceReturnsServiceFulfilmentServicePaymentServiceStockServiceThe tools we choose have a big effect on our architecture
GUIUIServiceOrdersServiceReturnsServiceFulfilmentServicePaymentServiceStockServiceEvent DrivenRequest Response
Kafka is well suited to Event Driven Architectures
It will lead you down that path
The Tool Set
a Distributed Log
What is a Distributed Log?
Shard on the way in
ProducingServicesKafkaConsumingServices
Each shard is a queue
ProducingServicesKafkaConsumingServices
Consumers share loadProducingServicesKafkaConsumingServices
Reduces to a globally ordered queue
Load Balanced Services
Fault Tolerant Services
Build Always On ServicesRely on Fault Tolerant Broker
Services can Rewind & Replay the logRewind & Replay
Compacted Log(retains only latest version)
Version 3Version 2Version 1Version 2Version 1Version 5Version 4Version 3Version 2Version 1
A database engine for data-in-flight
Max(price)From orderswhere ccy=GBPover 1 day windowemitting every secondContinuously Running Queries
What is stream processing engine?DataIndexQueryEngineQueryEnginevsDatabaseFinite sourceStream ProcessorInfinite source
WindowingFor unordered or unpredictable streamsSlidingFixed(tumbling)
Features: similar to database query engineJoinFilterAggr-egateViewWindow
Stateful Stream ProcessingstreamCompactedstreamJoinStream DataStream-TabularDataWindowedStream
Locally Cached Table(disk resident)KafkaKafka Streams
Useful for EnrichmentstreamCompactedstreamJoinOrders
CustomersKafkaKafka Streams
Scales Out
Embeddable
Orders ServiceKafka
Orders Service
Orders Service
Kafka ConnectView Replication
Sometimes you need to physically move data
Replicate it, so both copies are identical
Iterate via regeneration
Kafka Connect
KafkaConnectKafkaConnectKafka
So
Service BackboneScalable, Fault Tolerant, Concurrent, Strongly Ordered, Stateful
Embeddable tool for data manipulation
Adapt data streams(transformation, stream maintenance, analytic function)
Replicate Data Sources Exactly
Create Regenerable, Streaming Views
How do we actually do this?10 (opinionated) principals for Streaming Services
1. Dont use Kafka for shopping carts!(OK, you can, but use sparingly)
ShoppingCartUIServiceBroker/durability/broadcast add little to request response
Do use Kafka for your core business processing.
Think business events. An order was created, a payment was received, a trade was booked etc. Pub/Sub.
GUIUIServiceOrdersServiceReturnsServiceFulfilmentServicePaymentServiceStockService
2. Pick Topics with Business Significance
OrdersPaymentsReturnsInvoices
Give your messages meaningful IDs and version themOrdersService1-Order-1234-v2Should relate tothe real worldShould be Versioned (if mutable)
Include the servicename
Note the key used for sharding in Kafka may not be this key
3. Decouple publishers from subscribers
GUIUIServiceOrdersServiceReturnsServiceFulfilmentServicePaymentServiceStockService
Add Request/Response only where needed
GUIUIServiceOrdersServiceReturnsServiceFulfilmentServicePaymentServiceStockService
REST
4. Use the log to regenerate stateAvoid journaling incoming events
Event Source side effectsUse offsets to tie these back to the streamStore in:KafkaKstreams state storeOther DB
5. Apply the Single Writer Principal
Change at source (by calling that service)Let the change propagate backKeep local copies read only.
GUIUIServiceOrdersServiceReturnsServiceFulfilmentServicePaymentServiceStockService
(1) Change Orders at source
(2) Let the change propagate through
6. Leverage keeping datasets inside the broker
customercountrysupplierex-rates
Leverage keeping only the latest version (table view)
Version 3Version 2Version 1Version 2Version 1Version 5Version 4Version 3Version 2Version 1
Join & Process on the flystreamCompactedstreamJoinOrders
CustomersKafkaKafka Streams
7. Prefer stream processing over maintaining historic views
UIServiceOrdersServiceReturnsServiceFulfilmentServicePaymentServiceStockService
Orders
Historic copiesdiverge
Join & Process on the flystreamCompactedstreamJoinOrders
CustomersKafkaKafka Streams
8. Sometimes you need historic views. => Replicate & Keep Read Only
Replicate
KafkaConnectKafkaConnectKafkaKeep me Read Only
Iterate
Polyglotic Persistence
9. Use Schemas(especially if data is retained)
Schemaless data doesnt age well
Messages
Confluent Schema Registry can help
OrdersService
SchemaRegistry
EmailService
Avro
10. Consider Stream Management Services
StreamManagementKafkaRetaining data => Admin tasksSimilar to the role of a DBA Data MigrationRepartitioningLatest/versionedEnvironment ManagementCQRS
KStreams is a good toolset for this
Stream ManagementKStreamsLatestStreamVersionedStream
So1. Dont use Kafka for shopping carts!2. Pick Topics with Business Significance3. Decouple publishers from subscribers4. Use the log to regenerate state5. Apply the Single Writer Principal6. Leverage keeping datasets inside the broker7. Prefer stream processing over maintaining historic views8. Sometimes you need historic views. => Replicate Read Only 9. Use Schemas10. Consider Stream Management Services
Microservices push us away from shared, mutable state
But state needs to be communicated
In an increasingly data-heavy world we need tools to do this efficiently
and in real time.
We need a data-centric toolset to do thisAll Your DataImmutableStreams
Event Driven ServicesStreamManagementServicesLegacy CDCView ReplicationPolyglotic PersistencetablesstreamsStreamingServices
Keep it simple, Keep it moving
@benstopfordThanks!