goto conference - monolithic batch goes eaming · monolithic batch goes eaming a story about one...
TRANSCRIPT
Monolithic Batch GoesMicroservice Streaming
A story about one transformation
Charles Tye & Anton Polyakov
Who are We?
3 •
Anton Polyakov
Head of ApplicationDevelopment
2 years in Nordea
Charles Tye
Head of Core Services& Risk IT
17 years in Nordea
Develop solutions forMarket RiskCredit Risk
Liquidity RiskStress Testing
Messaging
Together with around 70 other people fromall over the world
What We Do
Market Risk
4 •
The high level view
Quantify potentiallosses and exposures
Do many smallrisks add up to a bigrisk?
Can risks combine inunusual andunexpected ways?
Market Risk
5 •
Line of Defence
Protect Nordea andour customers
Daily internal reportingand external reporting toregulators
Independent function
Analysis and insight intothe sources of risk
Control of risk
Management of capital
Examples of Risk Analysis
6 •
Value at Risk
Look at last 2 years ofmarket history
Average of the worst 1%of outcomes
Simulate if the samething happened againtoday.
Highly non-linear butrequirement to drill inand find the drivers
Examples of Risk Analysis
7 •
Stress Scenarios
“Black Swan” worst casescenarios Unexpected outcomes fromfuture events
Example: Brexit
Simulate if it happened
An Interesting Technology Problem
8 •
Consistent
Non-linear
Volume
Speed
Risk Analysis:Everything has to be included= know when you are complete
Risk does not sumover hierarchies Drill-down is nontrivial Traditional OLAPaggregate &incrementdoesn’t work
10,000,000 ,000 ,000
Reactive nearreal-timecalculations Streaming dataFast correctionsand “what-if” Interactivesub-secondqueries on hugedata sets
Challenge No 1.
Find the seams
Break it up
Reusable components
Replace a piece at a time
9 •
Spaghetti
Challenge No 2.
10 •
Develop a new service
Integrate into the legacy system Reconcile the output Find and fix legacy bugs Fight complification
Challenge No 3.
Batch is synchronous state transfer. Theonly way to achieve consistency?
11 •
Consistency is seriously hard to combine with streaming
Event sourced and streaming approach
More robust, scalable and faster,especially for recovery
Comes with a cost
Challenge No 4.
Legacy SQL was slow
12 •
Partitions and horizontally scales out across commodity hardware.
Tougher challenges on terabyte-scale hardware due to NUMA limitations. Somecubes already > 200gb and larger ones planned.
Replace with in-memoryaggregation
Aggregate billions ofscenarios in-memory andpre-compute total vectorsover hierarchies (linear)
Non-linear measurescomputed lazily
Reactive and continuousqueries
Solution: Microservices!Well almost…
Single responsibility – replace pieces of legacy from the inside out
Self contained with business functional boundaries• Independent and rapid development – team owns the whole stack• Organisationally scalable – horizontally scale your teams
Flexible and maintainable – evolve the architecture
Smart endpoints and dumb pipes
Innovation and short lifecycles
13 •
The problem• Business:
• Multi-model Market Risk calculator for Nordea portfolio• VaR on different organization levels with 5-6 different models in parallel
• IT:• 7000 CPU hours of grid calculation• More than 4000 SQL jobs
• Graph with more than 10000 edges• Nightly batch flow
14 •
How did it look like?
• Well, you know. 10 years of development
• In SQL
• No refactoring(who needs it?)
15 •
Precisely, how did it look?
16 •
Logical architecture
Monolith staged app
17 •
Now a little of complication
Sloo-o-o-ow Fat. So it breaksCan be parallel?
18 •
So what to do?
We all know the answer probably (sincewe are at this section ☺ )
- Find logically isolated blocks- Keep an eye on non-functional aspect- Think of how they communicate- Think about what happens if something dies
19 •
Not quite a “classical” microservices…or?
produce enrich aggregate
- Request/response is not feasible- Synchronous interaction is too long- Some results are expensive to reproduce
20 •
So we need…
A middleware which
- “Glues” services together- Caches important results- Serves as a coordinator and work distributor
21 •
Scale out
Fast pub/sub Queues and setspull and dedup Distributed locks
22 •
Scale out
Fast pub/sub Queues and setspull and dedup Distributed locks
Locks? Whoneeds locks?
23 •
store store store
Pub/sub messaging as notifier
Producer Enricher Aggregator
consumer
Redis pub/sub
24 •
But…
25 •
There are two main problems in distributed messaging:2) Guarantee that each message is only delivered once1) Guarantee messages order2) Guarantee that each message is only delivered once
Enricher
Redis pub/sub
Incoming queue
Processing queue
EnricherProducer
store
Queues with atomic operations
BRPOPLPUSH
26 •
Sets and Hmaps – all good for dedup
In eventually consistent world dedupis your best friend
store - HSET
EnricherMultiple inserts due to recovery
Consistent state due to dedup
27 •
So how to scale out?
logically concurrently
Enricher<type A>Enricher
<type B>Enricher<type X>
Redis pub/sub
Aggregator<day 1>Aggregator
<day 2>Aggregator<day 3>Steal workFilter my events
RedLock + TTL
28 •
Demo
store store store
Producer Enricher Aggregator
consumer
Redis pub/sub
Incoming queueProcessing queue
RedLock + TTL
29 •
The Result and What We LearnedSuccess!
• Aggregate and produce risk: 5 hours → 30 mins• Corrections: 40 mins → 1 second• Earlier deliveries – more time to manage the risks• Faster recovery from problems• Happy risk managers
Important (and painful) to integrate new services into the existing system
Consistency is hard to combine with streaming (subject of another talk maybe)
When distributing remember first law of distributed objects architecture(do you remember it?)
30 •
The Result and What We Learned
First Law of Distributed Object Design:
"don't distribute your objects"
31 •
And of course…
32 •
https://dk.linkedin.com/in/charles-tye-a8aa88b
https://github.com/parallelstream/