athmos: a satellite anomaly detection framework microservice … .pdf · 2017-06-23 · athmos: a...
TRANSCRIPT
ATHMoS:
A Satellite Anomaly Detection Framework
– Microservice Architecture
Corey O‘MearaGerman Space Operation Center
ESAW 2017
June 21st, 2017
> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 1
> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 2
We want to compare new telemetry data withpast data to see how its behavior has changed
Past Telemetry Todays Telemetry
General Idea of Anomaly Detection
> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 3
Why Automated Telemetry Checking?
> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 4
Big Data, Big Computations…
1. Noise Extraction and De-Trending
> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 5
2. Sub-Cluster Dimension Approximation
> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 6
> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 7
3. Anomaly Score Computation
Raw Data
Smoothed
Data
Noise Data
> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 8
4. Deep Learning NN: Time Series Forecasting
90-min Prediction
45-min Recent Data
ATHMoS: System Architecture Requirements
High Availability
Necessary to ensure 24/7 monitoring in the operational environment
Scalability:
The resulting system should be easily scalable for more
computational power
Performance:
The complete chain from import over processing to storing the data
should complete in a few minutes for a days worth of telemetry data
Modifiability:
The processing chain needs to be easily modifiable to allow for
individual mission-specific adjustments
> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 9
> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 10
What Are Microservices? (a.k.a. „fine-grained“ SoA)
> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 11
TelemetryDB: Apache Cassandra
> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 12
Used by (among many others):
• Distributed database
• Supports multiple datacenters
• Highly available and scalable
• Developed for Big Data @Facebook
Apache Cassandra
Data Processing: Apache Spark
Apache Spark
• Distributed computational engine
• Highly scalable and asynchronous calculations
• „Next Generation“ Hadoop (~100 times faster)
• Allows real-time processing (data streams)
> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 13
Used by (among many others):
> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 14
Language: Scala
Akka (Scala)
• Framework to develop fault tolerant services
• Both OOP and Functional Programming
• Asyncronous communication
• Typesafe better version of Python (imo)
Used by (among many others):
> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 15
Message Broker: Apache Kafka
Apache Kafka
• Originally developed at LinkedIn
• Open Sourced in 2011
• Distributed messaging (pub/sub) system
• By far, the fastest w.r.t messages/sec
Used by (among many others):
> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 16
ATHMoS Microservices Overview
Microservice Orchestration: DC/OS
> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 17
Deployment Infrastructure
> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 18
x2 x6
x2
Cluster Details
> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 19
App Scheduler
DC/OS Dashboard
> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 20
App Launcher and Scheduler
> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 21
Cluster Details
> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 22
Allows for a Reactive
System
App Resource Monitoring > O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 23
Application Instance CPU
Usage
Application Instance RAM
Usage
Application Instance Cached
Memory Usage
Application Instance Network
Usage
Total System Usage Statistics
ATHMoS as a Reactive System
> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 24
The distributed nature of the system allows us to define limits for CPU,
RAM, network connections/traffic for each component of the software
such that if it is under heavy load it will elastically scale that individual
component
> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 25
Outlook: Microservice Template
Infrastructure LayerDomain LayerApplication
Layer
> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 26
Outlook: Continuous Integration and Delivery
> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 27
In Conclusion…read this!
Thanks for your attention!
@cpomeara
> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 28
References
1. O’Meara, C., Schlag, L., Faltenbacher, L., Wickler, M., “ATHMoS: Automated Telemetry
Health Monitoring System at GSOC using Outlier Detection and Supervised Machine
Learning,” Proceedings of the AIAA SpaceOps 2016 Conference, May 2016.
2. “Building Microservices”, Sam Newman, O’Reilly Media, Inc., 2015.
3. http://martinfowler.com/articles/microservices.html
4. https://www.linkedin.com/pulse/20141128054428-13516803-monolithic-vs-microservice-
architecture
5. http://www.lab41.org/transformers-rdd-in-disguise/
6. https://www.infoq.com/articles/apache-kafka
> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 29
Backup Slides
> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 30
Frontend Prototype
> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 31
> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 32
Frontend Prototype
> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 33
Frontend Prototype
Example: Battery Voltage Drop
> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 34
Triggered OOL
Battery Voltage Anomaly first
detected when OOL
triggered during an image
acquisition datatake during
eclipse
Example: Battery Voltage Drop
> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 35
ATHMoS correctly identified
not only the anomaly where
the OOL was triggered but
anomalies of the same type
that occurred more than 1
month in advance
OOL
July 2015
Outlie
r S
core
(0-1
00%
)
Example: Battery Voltage Drop
> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 36
Outlie
r S
core
(0-1
00%
)
June 2015
ATHMoS correctly identified
not only the anomaly where
the OOL was triggered but
anomalies of the same type
that occurred more than 1
month in advance