micro-service data pipelines - smd symposium · to consider team skill, decomposability of...
TRANSCRIPT
Copyright © 2018 Northrop Grumman Systems Corporation, All Rights Reserved
Micro-Service Data
Pipelines
August 2018
Connor WoodSystems Engineer
Mission Systems
Northrop Grumman
Space and Missile Defense
Symposium
Copyright © 2018 Northrop Grumman Systems Corporation, All Rights Reserved Approved For Public Release #18-1494; Unlimited Distribution
Copyright © 2018 Northrop Grumman Systems Corporation, All Rights Reserved
Introduction to Micro-services
2Approved For Public Release #18-1494; Unlimited Distribution
Copyright © 2018 Northrop Grumman Systems Corporation, All Rights Reserved
What are micro-services?
• Micro-service architectures provide a means for software developers to
create loosely coupled software systems that can be modified, tested,
and deployed quickly and safely. In simplest terms, a micro-service is
a software component that "does one thing, and does it well"
3
Representation of a
monolithic
architecture where
all the functionality
is in a single
process
Representation of a
micro-service
architecture where
the functionality is
split into separate
processes and
communicating with
some lightweight
mechanism
Approved For Public Release #18-1494; Unlimited Distribution
Copyright © 2018 Northrop Grumman Systems Corporation, All Rights Reserved
Why micro-services?
4
12FactorApp.net
Written by Adam Wiggins
Simplicity
Consistency
Easy to refactor
Scalability and independent deployment
Individual service availability and failure
recovery
Preserve modularity
Multiple platforms
Advantages
Monolith Micro-service
Microservices
GOTO conference 2014 – Martin
Fowler
Resources
Approved For Public Release #18-1494; Unlimited Distribution
Copyright © 2018 Northrop Grumman Systems Corporation, All Rights Reserved
Why micro-services in this mission?
• Keep pace with the adversary
• Need validation and quality
• Need evolving Weapon Systems
5
Timeliness and quality of this turnaround depends on the flow of data, whether in
operation processing or in testing and analysis.
Approved For Public Release #18-1494; Unlimited Distribution
Copyright © 2018 Northrop Grumman Systems Corporation, All Rights Reserved
Streamlining a Weapon System
Data Pipeline
6Approved For Public Release #18-1494; Unlimited Distribution
Copyright © 2018 Northrop Grumman Systems Corporation, All Rights Reserved
What we had
7
Logged Data
MonolithicSimulation
MonolithicScenario
Definition
User DataNeed
AnalysisProduct
The data produced by the simulation was not always ready for a analyst
Format changes between versions added to the analysis task
Approved For Public Release #18-1494; Unlimited Distribution
Copyright © 2018 Northrop Grumman Systems Corporation, All Rights Reserved
So we post-processed to make a standardized
analysis database
8
Logged Data
MonolithicSimulation
MonolithicScenario
Definition
User DataNeed
ModifiedDatabase
PostProcessing
AnalysisProduct
Standardized Analysis was often the same, so it could be automated
Approved For Public Release #18-1494; Unlimited Distribution
Copyright © 2018 Northrop Grumman Systems Corporation, All Rights Reserved
So we automated it
9
Logged Data
MonolithicSimulation
MonolithicScenario
Definition
User DataNeed
ModifiedDatabase
PostProcessing
AnalysisProduct
AutomatedAnalysis
The speed of analysis encouraged Monte Carlo what-if studies
Approved For Public Release #18-1494; Unlimited Distribution
Copyright © 2018 Northrop Grumman Systems Corporation, All Rights Reserved
So we segmented the monolith
10
Part 1Logged Data
SimulationPart 1
MonolithicScenario
Definition
User DataNeed
Part 2Logged Data
SimulationPart 2
ModifiedDatabase
PostProcessing
AnalysisProduct
AutomatedAnalysis
Part 1 performed Monte Carlo on inputs based on the user inputs, such
that the user didn’t need to specify every single scenario
If the user wanted to change Monte Carlo set, we didn’t want to have to
re-derive and re-process everything in a batch, when we could reuse
previous sets
Approved For Public Release #18-1494; Unlimited Distribution
Copyright © 2018 Northrop Grumman Systems Corporation, All Rights Reserved
So we segmented even more
11
Part 1Logged Data
SimulationPart 1
MonolithicScenario
Definition
User DataNeed
Part 2Logged Data
SimulationPart 2
Part 3Logged Data
SimulationPart 3
ModifiedDatabase
PostProcessing
AnalysisProduct
AutomatedAnalysis
This allowed us to loosely couple sets of inputs, derived inputs,
processing, and outputs
Not only did we want to reuse data sets, but we also wanted to finish
new request as fast as possible
Approved For Public Release #18-1494; Unlimited Distribution
Copyright © 2018 Northrop Grumman Systems Corporation, All Rights Reserved
How could we make it faster?
12
Part 1Logged Data
SimulationPart 1
MonolithicScenario
Definition
User DataNeed
Part 2Logged Data
SimulationPart 2
Part 3Logged Data
SimulationPart 3
ModifiedDatabase
PostProcessing
AnalysisProduct
AutomatedAnalysis
Part 2 and Part 3 were bottlenecks in the pipeline
Because they had been broken apart and loosely coupled, we could
farm out the processing to a zombie net
Approved For Public Release #18-1494; Unlimited Distribution
Copyright © 2018 Northrop Grumman Systems Corporation, All Rights Reserved
Our application server could punch out Part 3 exceptionally fast in
parallel, but was bottle necked by Part 2 series processing
Workstations were faster series processors versus the application server
Hardware Architecture
13Approved For Public Release #18-1494; Unlimited Distribution
Copyright © 2018 Northrop Grumman Systems Corporation, All Rights Reserved
Data cluster messaging bus
14
We moved data logging at each processing step to the network data
cluster, allowing the lab to run as a zombie net on each processing step
Approved For Public Release #18-1494; Unlimited Distribution
Copyright © 2018 Northrop Grumman Systems Corporation, All Rights Reserved
Thus distributing the part 2 and 3 processing
15
Part 1Logged Data
SimulationPart 1
MonolithicScenario
Definition
User DataNeed
Part 2Logged Data
SimulationPart 2
Part 3Logged Data
SimulationPart 3
ModifiedDatabase
PostProcessing
AnalysisProduct
AutomatedAnalysis
Naturally we became more sensitive towards finding bottlenecks
Approved For Public Release #18-1494; Unlimited Distribution
Copyright © 2018 Northrop Grumman Systems Corporation, All Rights Reserved
So we added metrics to monitor throughput
16
Part 1Logged Data
SimulationPart 1
MonolithicScenario
Definition
User DataNeed
Part 2Logged Data
SimulationPart 2
Part 3Logged Data
SimulationPart 3
ModifiedDatabase
PostProcessing
AnalysisProduct
Metrics
AutomatedAnalysis
With the pipeline and hardware off to the races, more data availability
created more user data need
A user needed special scenarios for Part 3 specific data in short turn
Approved For Public Release #18-1494; Unlimited Distribution
Copyright © 2018 Northrop Grumman Systems Corporation, All Rights Reserved
Hacking Part 1 data, we satisfied the quick turn
17
User DataNeed
Part 2Logged Data
SimulationPart 2
Part 3Logged Data
SimulationPart 3
Part 1Logged Data
With specialty requests coming in, we chose to modify Part 1 Logged
Data to be a easier to use as an input data set, while still being an
output data set from Part 1
Approved For Public Release #18-1494; Unlimited Distribution
Copyright © 2018 Northrop Grumman Systems Corporation, All Rights Reserved
Effectively streamlining user data need
18
User DataNeed
Part 2Logged Data
SimulationPart 2
Part 3Logged Data
SimulationPart 3
Part 1Logged Data
Approved For Public Release #18-1494; Unlimited Distribution
Copyright © 2018 Northrop Grumman Systems Corporation, All Rights Reserved
Our end-to-end pipeline just needed a user
need as input, regardless of starting at 1 or 2
19
Part 1Logged Data
SimulationPart 1
MonolithicScenario
Definition
User DataNeed
Part 2Logged Data
SimulationPart 2
Part 3Logged Data
SimulationPart 3
ModifiedDatabase
PostProcessing
AnalysisProduct
Metrics
AutomatedAnalysis
Because the segmentation and loose coupling, we could work changes
on any module in parallel
This created an uncertainty in knowing what each module got or
produced was good
Approved For Public Release #18-1494; Unlimited Distribution
Copyright © 2018 Northrop Grumman Systems Corporation, All Rights Reserved
So we added single run and batch V&V
20
Part 1Logged Data
SimulationPart 1
MonolithicScenario
Definition
User DataNeed
Part 2Logged Data
SimulationPart 2
Part 3Logged Data
SimulationPart 3
ModifiedDatabase
PostProcessing
AnalysisProduct
Metrics
AutomatedAnalysis
Single RunV&V
Batch V&V
We started recognizing a pattern, and found that we were effectively
rediscovering the textbook benefits of micro-service architectures
Approved For Public Release #18-1494; Unlimited Distribution
Copyright © 2018 Northrop Grumman Systems Corporation, All Rights Reserved
How we could make it even better still (e.g.
lessons learned)
• We were using a hybrid of Bash, Perl, Python, LaTeX, and MATLAB
as glue for C++, C, and Java
– Loose coupling and multiple components allowed us to develop in multiple
languages, whatever was easiest for the engineer working a particular change
– The modular and flexible environment was more favorable for development, and
worked better with smaller components
– Major parts were usually restricted to a single language and a smaller subset of
developers
• The parallelization was clunky, and only really supported Part 2 and
Part 3 with a file IO system based data exchange between processes
– A network of micro-services would allow any service to be scaled to support the
processing, and enable inter-process communication with lower latency
– Some of the one-off special studies could operate as sub-networks of micro-
services
21Approved For Public Release #18-1494; Unlimited Distribution
Copyright © 2018 Northrop Grumman Systems Corporation, All Rights Reserved
Designing a Micro-service Network
22Approved For Public Release #18-1494; Unlimited Distribution
Copyright © 2018 Northrop Grumman Systems Corporation, All Rights Reserved
Data production as a micro-service
• Service structure
– Check out or subscribe to processing tasks based on user inputs
– Produce a set of outputs which can be checked out as input tasks by downstream
micro-services
• Services become independent
• Services become scalable
23
DataProductService
User DataNeed
What if we treated every module as a distributable service?
Approved For Public Release #18-1494; Unlimited Distribution
Copyright © 2018 Northrop Grumman Systems Corporation, All Rights Reserved
Data Structure
• Change each service’s data structure
from input/output to include a module
and version identifier as metadata
• When a unique service module /
version sees unprocessed inputs, it
checks out the highest priority (i.e.
oldest task) to do work on
• This allows automated generation of
data using multiple modules /
versions
– And comparison of outputs using
common inputs
24
Outputs
Module /Version
Identifier
Inputs
Outputs
Inputs
Approved For Public Release #18-1494; Unlimited Distribution
Copyright © 2018 Northrop Grumman Systems Corporation, All Rights Reserved
Micro-service network structure
25
UserData Need
Approved For Public Release #18-1494; Unlimited Distribution
Copyright © 2018 Northrop Grumman Systems Corporation, All Rights Reserved
Depending on user data need, the network
scales on the job that needs doing
26
UserData Need
Approved For Public Release #18-1494; Unlimited Distribution
Copyright © 2018 Northrop Grumman Systems Corporation, All Rights Reserved
Definition of desired outputs allows sub-
networks of micro-services to be called
27
UserData Need
Approved For Public Release #18-1494; Unlimited Distribution
Copyright © 2018 Northrop Grumman Systems Corporation, All Rights Reserved
Industry Tools & What’s Coming
Next
28Approved For Public Release #18-1494; Unlimited Distribution
Copyright © 2018 Northrop Grumman Systems Corporation, All Rights Reserved
Industry Tools & What’s coming next
• Docker: Operating-System-level virtualization (i.e. containerization)
• Kubernetes: Automatic deployment, scaling, and management (i.e.
container orchestration)
• Jenkins: Continuous integration, continuous testing, continuous delivery
• Chaos Monkey (Netflix): IT infrastructure resilience, operational resilience
– Chaos Gorilla, Latency Monkey, Doctor Monkey, Janitor Monkey, Conformity Monkey,
Security Monkey, 10-18 (i.e. Regional) Monkey
Micro-services are a strong candidate for enterprise architectures, but need
to consider team skill, decomposability of architecture, and network latency
29
“Often the true consequences of your architectural decisions are only
evident several years after you made them.” - Martin Fowler
Approved For Public Release #18-1494; Unlimited Distribution
Approved For Public Release #18-1494; Unlimited Distribution