trends affecting system architectures: mobile …...ka;a streams akka streams … sessions streams...
TRANSCRIPT
Dean Wampler, Ph.D. [email protected] polyglotprogramming.com/talks
Trends Affecting System Architectures: Mobile Engagement, Streaming Data, and Machine Learning
©DeanWampler&Lightbend,2007-2019 @deanwampler
lbnd.io/fast-data-ebook
Data Streaming Architectures
@deanwampler
lightbend.com/lightbend-platform @deanwampler
But first, introductions…
What we’re up to at Lightbend… lightbend.com/lightbend-pipelines-demo
What We’ll Discuss
What We’ll Discuss
• Forces driving us towards stream processing• Streaming architectures • Requirements• An example
• Mixing data science and data engineering • Other production concerns
Forces driving us towards stream processing
EnergyMedical
Finance
… and IoT
Telecom
MobileStateoftheartphone!
EnergyMedical
Finance
… and IoT
Telecom
Mobile@deanwampler
Information value has a half life;
it decays with time
EnergyMedical
Finance
… and IoT
Telecom
Mobile
Historically, software developers have downplayed the
importance of data.We can’t anymore.
@deanwampler
@deanwampler
Streaming Architectures: Requirements
@deanwampler
• Reliability - fault and “surprise” tolerant• Availability - “always on”• Low latency - for some definition of “low”• Scalability - up and down• Adaptability - ideally without restarts
@deanwampler
Reminds me of microservices
…
Browse
REST
AccountOrders
ShoppingCart
API Gateway
Inventory
@deanwampler
Streaming Architectures: An Example
DataCenter
Mini-batch,Batch
Spark
LowLatencyMicroservices
Ka;aStreamsAkkaStreams
…
Sessions
Streams
Storage
Device
Telemetry1
2. → Model Training3. ← New models
2, 3
4. ← Model Storage5. → Boot up, historical data
4, 5
6. → Data Pipeline 7. → Model Serving 8. ← Anomalies
6, 7, 8Ingest Scores9
Corrective Action
10
Spark
MicroserviceMicroserviceMicroservice
DeviceSessionMicroservices
Persistence
Ka+a Cluster
Broker
@deanwampler
Detect anomalies in IoT devices
DataCenter
Mini-batch,Batch
Spark
LowLatencyMicroservices
Ka;aStreamsAkkaStreams
…
Sessions
Streams
Storage
Device
Telemetry1
2. → Model Training3. ← New models
2, 3
4. ← Model Storage5. → Boot up, historical data
4, 5
6. → Data Pipeline 7. → Model Serving 8. ← Anomalies
6, 7, 8Ingest Scores9
Corrective Action
10
Spark
MicroserviceMicroserviceMicroservice
DeviceSessionMicroservices
Persistence
Ka+a Cluster
Broker
@deanwampler
Sessionmanagement,RESTmicroservices Model
Scoring
ModelTraining
Threegroupsoffunctionality
DataCenter
Mini-batch,Batch
Spark
LowLatencyMicroservices
Ka;aStreamsAkkaStreams
…
Sessions
Streams
Storage
Device
Telemetry1
2. → Model Training3. ← New models
2, 3
4. ← Model Storage5. → Boot up, historical data
4, 5
6. → Data Pipeline 7. → Model Serving 8. ← Anomalies
6, 7, 8Ingest Scores9
Corrective Action
10
Spark
MicroserviceMicroserviceMicroservice
DeviceSessionMicroservices
Persistence
Ka+a Cluster
Broker
@deanwampler
IngestdevicetelemetrytoKafka
DataCenter
Mini-batch,Batch
Spark
LowLatencyMicroservices
Ka;aStreamsAkkaStreams
…
Sessions
Streams
Storage
Device
Telemetry1
2. → Model Training3. ← New models
2, 3
4. ← Model Storage5. → Boot up, historical data
4, 5
6. → Data Pipeline 7. → Model Serving 8. ← Anomalies
6, 7, 8Ingest Scores9
Corrective Action
10
Spark
MicroserviceMicroserviceMicroservice
DeviceSessionMicroservices
Persistence
Ka+a Cluster
Broker
@deanwampler
IngestdevicetelemetrytoKafka
Analogs
At restStreaming
Microservices
Ka+aStreamsAkkaStreams
…
Ka#a Cluster
Broker
Mini-batch
SparkSparkFlink…
HDFS
Batch
SparkMapReduce
…
DataCenter
Mini-batch,Batch
Spark
LowLatencyMicroservices
Ka;aStreamsAkkaStreams
…
Sessions
Streams
Storage
Device
Telemetry1
2. → Model Training3. ← New models
2, 3
4. ← Model Storage5. → Boot up, historical data
4, 5
6. → Data Pipeline 7. → Model Serving 8. ← Anomalies
6, 7, 8Ingest Scores9
Corrective Action
10
Spark
MicroserviceMicroserviceMicroservice
DeviceSessionMicroservices
Persistence
Ka+a Cluster
Broker
@deanwampler
ReadtelemetryintoaperiodicSparkjobformodeltrainingtodetectanomalies
Largedatavolume,Longlatency(seconds-days)
DataCenter
Mini-batch,Batch
Spark
LowLatencyMicroservices
Ka;aStreamsAkkaStreams
…
Sessions
Streams
Storage
Device
Telemetry1
2. → Model Training3. ← New models
2, 3
4. ← Model Storage5. → Boot up, historical data
4, 5
6. → Data Pipeline 7. → Model Serving 8. ← Anomalies
6, 7, 8Ingest Scores9
Corrective Action
10
Spark
MicroserviceMicroserviceMicroservice
DeviceSessionMicroservices
Persistence
Ka+a Cluster
Broker
@deanwampler
UpdatedmodelparametersarewrittenbacktoKafkainanewtopic
DataCenter
Mini-batch,Batch
Spark
LowLatencyMicroservices
Ka;aStreamsAkkaStreams
…
Sessions
Streams
Storage
Device
Telemetry1
2. → Model Training3. ← New models
2, 3
4. ← Model Storage5. → Boot up, historical data
4, 5
6. → Data Pipeline 7. → Model Serving 8. ← Anomalies
6, 7, 8Ingest Scores9
Corrective Action
10
Spark
MicroserviceMicroserviceMicroservice
DeviceSessionMicroservices
Persistence
Ka+a Cluster
Broker
@deanwampler
Updatedmodelparametersarealsowrittentolonger-termstorage(forsystemrestarts,auditing,…)
DataCenter
Mini-batch,Batch
Spark
LowLatencyMicroservices
Ka;aStreamsAkkaStreams
…
Sessions
Streams
Storage
Device
Telemetry1
2. → Model Training3. ← New models
2, 3
4. ← Model Storage5. → Boot up, historical data
4, 5
6. → Data Pipeline 7. → Model Serving 8. ← Anomalies
6, 7, 8Ingest Scores9
Corrective Action
10
Spark
MicroserviceMicroserviceMicroservice
DeviceSessionMicroservices
Persistence
Ka+a Cluster
Broker
@deanwampler
Ingestthetelemetrydatatoscoreit,lookingforanomalies
DataCenter
Mini-batch,Batch
Spark
LowLatencyMicroservices
Ka;aStreamsAkkaStreams
…
Sessions
Streams
Storage
Device
Telemetry1
2. → Model Training3. ← New models
2, 3
4. ← Model Storage5. → Boot up, historical data
4, 5
6. → Data Pipeline 7. → Model Serving 8. ← Anomalies
6, 7, 8Ingest Scores9
Corrective Action
10
Spark
MicroserviceMicroserviceMicroservice
DeviceSessionMicroservices
Persistence
Ka+a Cluster
Broker
@deanwampler
Smalldatavolume,Lowlatency(milliseconds-…)
Ingestmodelparameterstoupdatemodelinthelow-latencymicroservicesformodelserving
DataCenter
Mini-batch,Batch
Spark
LowLatencyMicroservices
Ka;aStreamsAkkaStreams
…
Sessions
Streams
Storage
Device
Telemetry1
2. → Model Training3. ← New models
2, 3
4. ← Model Storage5. → Boot up, historical data
4, 5
6. → Data Pipeline 7. → Model Serving 8. ← Anomalies
6, 7, 8Ingest Scores9
Corrective Action
10
Spark
MicroserviceMicroserviceMicroservice
DeviceSessionMicroservices
Persistence
Ka+a Cluster
Broker
@deanwampler
WritethedetectedanomaliesbacktoKafkainanewtopic
DataCenter
Mini-batch,Batch
Spark
LowLatencyMicroservices
Ka;aStreamsAkkaStreams
…
Sessions
Streams
Storage
Device
Telemetry1
2. → Model Training3. ← New models
2, 3
4. ← Model Storage5. → Boot up, historical data
4, 5
6. → Data Pipeline 7. → Model Serving 8. ← Anomalies
6, 7, 8Ingest Scores9
Corrective Action
10
Spark
MicroserviceMicroserviceMicroservice
DeviceSessionMicroservices
Persistence
Ka+a Cluster
Broker
@deanwampler
Readanomalyinformationintomicroservicesthatmanagethedevicesremotely
DataCenter
Mini-batch,Batch
Spark
LowLatencyMicroservices
Ka;aStreamsAkkaStreams
…
Sessions
Streams
Storage
Device
Telemetry1
2. → Model Training3. ← New models
2, 3
4. ← Model Storage5. → Boot up, historical data
4, 5
6. → Data Pipeline 7. → Model Serving 8. ← Anomalies
6, 7, 8Ingest Scores9
Corrective Action
10
Spark
MicroserviceMicroserviceMicroservice
DeviceSessionMicroservices
Persistence
Ka+a Cluster
Broker
@deanwampler
Takecorrectiveaction,e.g.,uploadmoreinformation,downloadpatches,control-alt-delete,…
DataCenter
Mini-batch,Batch
Spark
LowLatencyMicroservices
Ka;aStreamsAkkaStreams
…
Sessions
Streams
Storage
Device
Telemetry1
2. → Model Training3. ← New models
2, 3
4. ← Model Storage5. → Boot up, historical data
4, 5
6. → Data Pipeline 7. → Model Serving 8. ← Anomalies
6, 7, 8Ingest Scores9
Corrective Action
10
Spark
MicroserviceMicroserviceMicroservice
DeviceSessionMicroservices
Persistence
Ka+a Cluster
Broker
@deanwampler
Sessionmanagement,RESTmicroservices Model
Scoring
ModelTraining
Threegroupsoffunctionality
@deanwampler
Streaming Architectures: An Example
Variation: Model Serving as a Service
DataCenter
Mini-batch,Batch
Spark
LowLatencyMicroservices
Ka;aStreamsAkkaStreams
…
Sessions
Streams
Storage
Device
Telemetry1
2. → Model Training3. ← New models
2, 3
4. ← Model Storage5. → Boot up, historical data
4, 5
6. → Data Pipeline 7. → Model Serving 8. ← Anomalies
6, 7, 8Ingest Scores9
Corrective Action
10
Spark
MicroserviceMicroserviceMicroservice
DeviceSessionMicroservices
Persistence
Ka+a Cluster
Broker
@deanwampler
From this…
@deanwampler
DataCenter
REST,gRPC,…
Spark
LowLatency
Ka9aStreamsAkkaStreams
…
Sessions
Streams
Storage
Device
Telemetry1
2. → Model Training
3
3. ← Model Storage4. → Boot up, historical data
3, 4
5. → Data Pipeline 7. ← Anomalies
5, 7Ingest Scores8
Corrective Action
9
TensorFlow,…
MicroserviceMicroserviceMicroservice
DeviceSessionMicroservices
Persistence
Ka+a Cluster
Broker 66. Model Serving
Bothtrainingandservingdonebyaseparateservice,likeTensorFlow
To this…
@deanwampler
Streaming Architectures: An Example
Variation: Model Serving on the Edge
DataCenter
Mini-batch,Batch
Spark
LowLatencyMicroservices
Ka;aStreamsAkkaStreams
…
Sessions
Streams
Storage
Device
Telemetry1
2. → Model Training3. ← New models
2, 3
4. ← Model Storage5. → Boot up, historical data
4, 5
6. → Data Pipeline 7. → Model Serving 8. ← Anomalies
6, 7, 8Ingest Scores9
Corrective Action
10
Spark
MicroserviceMicroserviceMicroservice
DeviceSessionMicroservices
Persistence
Ka+a Cluster
Broker
@deanwampler
From this…
DataCenter
Mini-batch,Batch
SparkSessions
Streams
Storage
Device
Telemetry1
2. → Model Training3. ← New models
2, 3
4. ← Model Storage5. → Boot up, historical data
4, 5
Corrective Action
8
Spark
MicroserviceMicroserviceMicroservice
DeviceSessionMicroservices
Persistence
Ka+a Cluster
Model Parameters6Model Parameters7
Broker
@deanwampler
Pushmodelupdatestothedeviceandscorethere.Eliminatesround-tripoverheadforscoring.
To this…
@deanwampler
Mixing data science and data engineering
@deanwampler
SoftwareEngineering toolbox
Data Science toolbox
Tools
@deanwampler
Data Science:• Less process oriented• Iterative, experimental• … but still within the
Scientific Method
Data Engineering:• Process oriented• Agile Manifesto• … which does not
mention data!
Process
https://derwen.ai/s/6fqt
@deanwampler
Data Science:• Comfortable with
uncertainty• Probabilities and
Statistics
Data Engineering:• Uncomfortable with
uncertainty• Prefer determinism• … while admitting that
distributed systems aren’t deterministic
Philosophy
@deanwampler
Other Production Concerns
@deanwampler
Models Are
Data
@deanwampler
Auditing• Kind of Model• Parameters and hyperparameters•When trained• Data used for training•When deployed, undeployed, etc.•…
@deanwampler
Auditing• Quality metrics• Serving metrics (how many records, scoring
times…)• Provenance of decision to retrain• The metrics gathered above that were used to
decide when to retrain
@deanwampler
Auditing•… and maybe most important:• What model was used to score a particular
record??
@deanwampler
Model Updates
Concept Drift
Models have a half life, too
@deanwampler
DataCenter
Mini-batch,Batch
Spark
LowLatencyMicroservices
Ka;aStreamsAkkaStreams
…
Sessions
Streams
Storage
Device
Telemetry1
2. → Model Training3. ← New models
2, 3
4. ← Model Storage5. → Boot up, historical data
4, 5
6. → Data Pipeline 7. → Model Serving 8. ← Anomalies
6, 7, 8Ingest Scores9
Corrective Action
10
Spark
MicroserviceMicroserviceMicroservice
DeviceSessionMicroservices
Persistence
Ka+a Cluster
Broker
Model Updates
Recap
Information value has a half life;
it decays with time
Streaming systems have similar requirements as
microservices
DataCenter
Mini-batch,Batch
Spark
LowLatencyMicroservices
Ka;aStreamsAkkaStreams
…
Sessions
Streams
Storage
Device
Telemetry1
2. → Model Training3. ← New models
2, 3
4. ← Model Storage5. → Boot up, historical data
4, 5
6. → Data Pipeline 7. → Model Serving 8. ← Anomalies
6, 7, 8Ingest Scores9
Corrective Action
10
Spark
MicroserviceMicroserviceMicroservice
DeviceSessionMicroservices
Persistence
Ka+a Cluster
Broker
Integratingdata science anddata engineering
is hard
Models also have ahalf life;
They have to be updated periodically
@deanwamplerMarslastSummer
DustyMilkyWay
References• Ideas:• O’Reilly Radar: Data, AI, others• distill.pub• The Algorithm• The Gradient
@deanwampler
References• General Information about Stream Processing• My O'Reilly Report on Architectures• Streaming Systems Book• Stream Processing with Apache Spark• Designing Data-Intensive Apps book
@deanwampler
References• Other Talks• Strata Talk on ML in a Streaming Context• Stream All the Things! (video)• Streaming Microservices with Akka Streams
and Kafka Streams (video)
@deanwampler
References• Tutorials• Model serving in streams• Stream processing with Kafka and
microservices
@deanwampler
But first, introductions…
What we’re up to at Lightbend… lightbend.com/lightbend-pipelines-demo
Buy my stuff!!
@deanwampler
Dean Wampler, Ph.D. [email protected]
@deanwampler lightbend.com/lightbend-platform
polyglotprogramming.com/talks
Questions?