powerpoint presentation · title: powerpoint presentation author: vitanachy katarzyna...
TRANSCRIPT
Leveraging Spark to develop AI-enabled products and services at Bosch
Agenda
Manufacturing Analytics SolutionPrasanth Lade
Financial ForecastingGoktug Cinar
Robert Bosch – a worldwide leading IoT Company
268Manufacturing sites
1000s Assembly lines
409,881Associates
60 Countries
460 Local subsidiaries
Four business sectors
Mobility Solutions Industrial Technology
Energy & BuildingTechnology
Consumer Goods
SunnyvalePittsburgh Renningen
Tubingen
Haifa
Bangalore
Shanghai
Bosch Center for Artificial Intelligence
Manufacturing Analytics Solution
Manufacturing Analytics using SparkSelf-Serve Analytics Pipeline
• Automate data pipelining and preparation
• Centralize data storage across assembly lines and plants
• Scalable compute and storage resources
• Standard analytics dashboards
• Self-service analysis
• Advanced analytics tools like Root cause analysis
Data Preparation Root Cause Analysis
Apache Impala
Tableau Extracts
Hadoop File SystemBosch Manufacturing Plants
KafkaTableau Server
Manufacturing Analytics using SparkWhy are parts failing quality checks?
Process 1
Process 2
Process 5Process 4Process 3
Potential root causes
• Measured process parameters
• Machine configurations
• Tools and components used
• Locations visited
Target of interestIdentify quality test failures for certain parts.
Manufacturing Analytics using SparkRoot Cause Analysis: Modules
Part graph generation
Feature extraction
Feature matrix generation
Root cause modeling
Assembly process of every unique part is
represented as a graph.Features are extracted
from the part graph.
Target variables are mapped to features.
Statistical models are applied to extract
potential root causes.
ParametersTestsTools etc.
ParametersTestsTools etc.
ParametersTestsTools etc.
ParametersTestsTools etc.
Manufacturing Analytics using SparkRoot Cause Analysis: Sample code
PART_ID PART_GRAPH
B6788098
FF556828
A6678B34
LOC 1 LOC 2 LOC 3 LOC 4
Sample part graph
Part graphs
PART_ID FEATURES
B6788098
[f1, f2]
FF556828
[f1, f2, f3, f4]
A6678B34
[f2, f3]
Features
Feature extractor
Manufacturing Analytics using SparkRoot Cause Analysis: Sample code
Feature extractor example
Manufacturing Analytics using Spark
▪ The volume of computations needed to identify root causes on a monthly basis:
Root Cause Analysis: Computational Complexity
Total assembly lines: ~ 10000
Avg. # of parts produced (per assembly line):~ 2 Million
Avg. # of data records in HDFS (per assembly line) : ~ 30 Billion
Manufacturing Analytics using SparkRoot Cause Analysis: The Challenge
Feature matrix generation
PART_ID FEATURES
B6788098 [f1, f2]
FF556828 [f1, f2, f3, f4]
A6678B34 [f2, f3]
PART_ID FEATURES
B6788098 [g1]
FF556828 [g1, g5, g6]
A6678B34 [g1, g2]
X =
DEPENDENT INDEPENDENT
f1 [ [g1], [g1],[g1] ]
f2 [ [g1,None], [g1, None], [g1, g2] ]
f3 [ [None, None],[g5, g6],[None, None] ]
• How to scale feature matrix generation for products with increasing volumes.
• Replaced loops with python functional constructs like:map, filter, reduce and partialfunctions
Challenge Solution
7 hours
2 hours
Before After
Financial Forecasting
Large Scale Forecasting using Spark: Background and Motivation
▪ Collaboration between controllers, programmers, data engineers, and data scientists
• Automatically generate sales forecasts
• Increase efficiency, objectivity, and accuracy
• Improve financial decision making for Bosch
GoalTeam
• Monthly forecast of KPIs (>300.000 time series; target 3-4M time series)
• Combination of +15 cutting-edge mathematical models (with two different data transformations) in one tool
• Automated model selection and hierarchically consistent forecasts
Results
Large Scale Forecasting using Spark
15+ companies under the Bosch
group
• Each company has specific business structure
• First application is for revenue forecasting
• Revenue can be broken down by customer, product, region, and business divisions
Scale of the task
• Forecasts are needed monthly, immediately after the month-closing calculations.
Task: Millions of forecasts within a few hours
• Assume we have 1 million time series
• 5 models per time series 5M forecasts
• ~5 seconds per model Compute time of 15M seconds
• 1000s cores needed
Large Scale Forecasting Using SparkTechnical Architecture
1. Create
Hierarchical Time
Series
3. AI based Time
Series Forecast4 Consolidate
Hierarchy
2. Automated
Model Selection
using AI
Traditional Models Hybrid Models
HierarchicalModels
State SpaceModels
Kubernetes
Large Scale Forecasting using Spark
▪ The task is embarrassingly parallelizable!
Why R?Latest and most popular models for forecasting are published in R.
• We can utilize these packages via user defined functions in Spark.
Why Spark?
Each core can receive one time series and the names of the models to be applied.
Compute forecasts.Return the combined results back to master node.
Large Scale Forecasting using Spark
▪ Sparklyr▪ Accepts data frames▪ Returns data frames
Sparklyr vs. SparkR
▪ SparkR▪ Accepts data frames or lists▪ Returns data frames or lists▪ More flexibility
Sparklyr UDF API
spark_apply
Applies a function to
each row or group of
SparkDataFrame
spark_apply()
Large Scale Forecasting using Spark
▪ User-defined functions (UDFs) in SparkR via spark.lapply ()▪ UDF over lists are more flexible▪ Enables the change of modeling and use of
heterogeneous data without a lot of change to the overall architecture
▪ Use SparkR::spark.addFile for sending files needed in all executors
▪ SparkR::spark.lapply () fails when we have a list with more than ~46k+ elements (solved in JIRA Issue: [SPARK-25234])
Spark – lessons learned
Large Scale Forecasting using SparkPerformance Gains
*computation time for 1893 time series
Thank you!
Abhirup Mallik (Bosch)Abishek Prasanna (Bosch)Jeff Thompson (Bosch)Kasia Vitanachy (Bosch)Lisa Marion Garcia (Bosch)Matthew Jones (Bosch)
Nicolas Douard (Virtue Foundation)Patrick Emmerich (Bosch)Phil Gaudreau (LinkedIn)Ruobing Chen (Facebook)Sascha Vetter (Bosch)Zichu Li (University of Rochester)
Feedback
Your feedback is important to us.
Don’t forget to rate and review the sessions.