powerpoint presentation · title: powerpoint presentation author: vitanachy katarzyna...

23

Upload: others

Post on 24-Sep-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PowerPoint Presentation · Title: PowerPoint Presentation Author: Vitanachy Katarzyna (CR/PJ-AI-S3);Cinar Goktug (CR/PJ-AI-S1);Lade Prasanth (CR/PJ-AI-S1) Created Date: 6/7/2020 11:46:22
Page 2: PowerPoint Presentation · Title: PowerPoint Presentation Author: Vitanachy Katarzyna (CR/PJ-AI-S3);Cinar Goktug (CR/PJ-AI-S1);Lade Prasanth (CR/PJ-AI-S1) Created Date: 6/7/2020 11:46:22

Leveraging Spark to develop AI-enabled products and services at Bosch

Page 3: PowerPoint Presentation · Title: PowerPoint Presentation Author: Vitanachy Katarzyna (CR/PJ-AI-S3);Cinar Goktug (CR/PJ-AI-S1);Lade Prasanth (CR/PJ-AI-S1) Created Date: 6/7/2020 11:46:22

Agenda

Manufacturing Analytics SolutionPrasanth Lade

Financial ForecastingGoktug Cinar

Page 4: PowerPoint Presentation · Title: PowerPoint Presentation Author: Vitanachy Katarzyna (CR/PJ-AI-S3);Cinar Goktug (CR/PJ-AI-S1);Lade Prasanth (CR/PJ-AI-S1) Created Date: 6/7/2020 11:46:22

Robert Bosch – a worldwide leading IoT Company

268Manufacturing sites

1000s Assembly lines

409,881Associates

60 Countries

460 Local subsidiaries

Four business sectors

Mobility Solutions Industrial Technology

Energy & BuildingTechnology

Consumer Goods

SunnyvalePittsburgh Renningen

Tubingen

Haifa

Bangalore

Shanghai

Bosch Center for Artificial Intelligence

Page 5: PowerPoint Presentation · Title: PowerPoint Presentation Author: Vitanachy Katarzyna (CR/PJ-AI-S3);Cinar Goktug (CR/PJ-AI-S1);Lade Prasanth (CR/PJ-AI-S1) Created Date: 6/7/2020 11:46:22

Manufacturing Analytics Solution

Page 6: PowerPoint Presentation · Title: PowerPoint Presentation Author: Vitanachy Katarzyna (CR/PJ-AI-S3);Cinar Goktug (CR/PJ-AI-S1);Lade Prasanth (CR/PJ-AI-S1) Created Date: 6/7/2020 11:46:22

Manufacturing Analytics using SparkSelf-Serve Analytics Pipeline

• Automate data pipelining and preparation

• Centralize data storage across assembly lines and plants

• Scalable compute and storage resources

• Standard analytics dashboards

• Self-service analysis

• Advanced analytics tools like Root cause analysis

Data Preparation Root Cause Analysis

Apache Impala

Tableau Extracts

Hadoop File SystemBosch Manufacturing Plants

KafkaTableau Server

Page 7: PowerPoint Presentation · Title: PowerPoint Presentation Author: Vitanachy Katarzyna (CR/PJ-AI-S3);Cinar Goktug (CR/PJ-AI-S1);Lade Prasanth (CR/PJ-AI-S1) Created Date: 6/7/2020 11:46:22

Manufacturing Analytics using SparkWhy are parts failing quality checks?

Process 1

Process 2

Process 5Process 4Process 3

Potential root causes

• Measured process parameters

• Machine configurations

• Tools and components used

• Locations visited

Target of interestIdentify quality test failures for certain parts.

Page 8: PowerPoint Presentation · Title: PowerPoint Presentation Author: Vitanachy Katarzyna (CR/PJ-AI-S3);Cinar Goktug (CR/PJ-AI-S1);Lade Prasanth (CR/PJ-AI-S1) Created Date: 6/7/2020 11:46:22

Manufacturing Analytics using SparkRoot Cause Analysis: Modules

Part graph generation

Feature extraction

Feature matrix generation

Root cause modeling

Assembly process of every unique part is

represented as a graph.Features are extracted

from the part graph.

Target variables are mapped to features.

Statistical models are applied to extract

potential root causes.

Page 9: PowerPoint Presentation · Title: PowerPoint Presentation Author: Vitanachy Katarzyna (CR/PJ-AI-S3);Cinar Goktug (CR/PJ-AI-S1);Lade Prasanth (CR/PJ-AI-S1) Created Date: 6/7/2020 11:46:22

ParametersTestsTools etc.

ParametersTestsTools etc.

ParametersTestsTools etc.

ParametersTestsTools etc.

Manufacturing Analytics using SparkRoot Cause Analysis: Sample code

PART_ID PART_GRAPH

B6788098

FF556828

A6678B34

LOC 1 LOC 2 LOC 3 LOC 4

Sample part graph

Part graphs

PART_ID FEATURES

B6788098

[f1, f2]

FF556828

[f1, f2, f3, f4]

A6678B34

[f2, f3]

Features

Feature extractor

Page 10: PowerPoint Presentation · Title: PowerPoint Presentation Author: Vitanachy Katarzyna (CR/PJ-AI-S3);Cinar Goktug (CR/PJ-AI-S1);Lade Prasanth (CR/PJ-AI-S1) Created Date: 6/7/2020 11:46:22

Manufacturing Analytics using SparkRoot Cause Analysis: Sample code

Feature extractor example

Page 11: PowerPoint Presentation · Title: PowerPoint Presentation Author: Vitanachy Katarzyna (CR/PJ-AI-S3);Cinar Goktug (CR/PJ-AI-S1);Lade Prasanth (CR/PJ-AI-S1) Created Date: 6/7/2020 11:46:22

Manufacturing Analytics using Spark

▪ The volume of computations needed to identify root causes on a monthly basis:

Root Cause Analysis: Computational Complexity

Total assembly lines: ~ 10000

Avg. # of parts produced (per assembly line):~ 2 Million

Avg. # of data records in HDFS (per assembly line) : ~ 30 Billion

Page 12: PowerPoint Presentation · Title: PowerPoint Presentation Author: Vitanachy Katarzyna (CR/PJ-AI-S3);Cinar Goktug (CR/PJ-AI-S1);Lade Prasanth (CR/PJ-AI-S1) Created Date: 6/7/2020 11:46:22

Manufacturing Analytics using SparkRoot Cause Analysis: The Challenge

Feature matrix generation

PART_ID FEATURES

B6788098 [f1, f2]

FF556828 [f1, f2, f3, f4]

A6678B34 [f2, f3]

PART_ID FEATURES

B6788098 [g1]

FF556828 [g1, g5, g6]

A6678B34 [g1, g2]

X =

DEPENDENT INDEPENDENT

f1 [ [g1], [g1],[g1] ]

f2 [ [g1,None], [g1, None], [g1, g2] ]

f3 [ [None, None],[g5, g6],[None, None] ]

• How to scale feature matrix generation for products with increasing volumes.

• Replaced loops with python functional constructs like:map, filter, reduce and partialfunctions

Challenge Solution

7 hours

2 hours

Before After

Page 13: PowerPoint Presentation · Title: PowerPoint Presentation Author: Vitanachy Katarzyna (CR/PJ-AI-S3);Cinar Goktug (CR/PJ-AI-S1);Lade Prasanth (CR/PJ-AI-S1) Created Date: 6/7/2020 11:46:22

Financial Forecasting

Page 14: PowerPoint Presentation · Title: PowerPoint Presentation Author: Vitanachy Katarzyna (CR/PJ-AI-S3);Cinar Goktug (CR/PJ-AI-S1);Lade Prasanth (CR/PJ-AI-S1) Created Date: 6/7/2020 11:46:22

Large Scale Forecasting using Spark: Background and Motivation

▪ Collaboration between controllers, programmers, data engineers, and data scientists

• Automatically generate sales forecasts

• Increase efficiency, objectivity, and accuracy

• Improve financial decision making for Bosch

GoalTeam

• Monthly forecast of KPIs (>300.000 time series; target 3-4M time series)

• Combination of +15 cutting-edge mathematical models (with two different data transformations) in one tool

• Automated model selection and hierarchically consistent forecasts

Results

Page 15: PowerPoint Presentation · Title: PowerPoint Presentation Author: Vitanachy Katarzyna (CR/PJ-AI-S3);Cinar Goktug (CR/PJ-AI-S1);Lade Prasanth (CR/PJ-AI-S1) Created Date: 6/7/2020 11:46:22

Large Scale Forecasting using Spark

15+ companies under the Bosch

group

• Each company has specific business structure

• First application is for revenue forecasting

• Revenue can be broken down by customer, product, region, and business divisions

Scale of the task

• Forecasts are needed monthly, immediately after the month-closing calculations.

Task: Millions of forecasts within a few hours

• Assume we have 1 million time series

• 5 models per time series 5M forecasts

• ~5 seconds per model Compute time of 15M seconds

• 1000s cores needed

Page 16: PowerPoint Presentation · Title: PowerPoint Presentation Author: Vitanachy Katarzyna (CR/PJ-AI-S3);Cinar Goktug (CR/PJ-AI-S1);Lade Prasanth (CR/PJ-AI-S1) Created Date: 6/7/2020 11:46:22

Large Scale Forecasting Using SparkTechnical Architecture

1. Create

Hierarchical Time

Series

3. AI based Time

Series Forecast4 Consolidate

Hierarchy

2. Automated

Model Selection

using AI

Traditional Models Hybrid Models

HierarchicalModels

State SpaceModels

Kubernetes

Page 17: PowerPoint Presentation · Title: PowerPoint Presentation Author: Vitanachy Katarzyna (CR/PJ-AI-S3);Cinar Goktug (CR/PJ-AI-S1);Lade Prasanth (CR/PJ-AI-S1) Created Date: 6/7/2020 11:46:22

Large Scale Forecasting using Spark

▪ The task is embarrassingly parallelizable!

Why R?Latest and most popular models for forecasting are published in R.

• We can utilize these packages via user defined functions in Spark.

Why Spark?

Each core can receive one time series and the names of the models to be applied.

Compute forecasts.Return the combined results back to master node.

Page 18: PowerPoint Presentation · Title: PowerPoint Presentation Author: Vitanachy Katarzyna (CR/PJ-AI-S3);Cinar Goktug (CR/PJ-AI-S1);Lade Prasanth (CR/PJ-AI-S1) Created Date: 6/7/2020 11:46:22

Large Scale Forecasting using Spark

▪ Sparklyr▪ Accepts data frames▪ Returns data frames

Sparklyr vs. SparkR

▪ SparkR▪ Accepts data frames or lists▪ Returns data frames or lists▪ More flexibility

Sparklyr UDF API

spark_apply

Applies a function to

each row or group of

SparkDataFrame

spark_apply()

Page 19: PowerPoint Presentation · Title: PowerPoint Presentation Author: Vitanachy Katarzyna (CR/PJ-AI-S3);Cinar Goktug (CR/PJ-AI-S1);Lade Prasanth (CR/PJ-AI-S1) Created Date: 6/7/2020 11:46:22

Large Scale Forecasting using Spark

▪ User-defined functions (UDFs) in SparkR via spark.lapply ()▪ UDF over lists are more flexible▪ Enables the change of modeling and use of

heterogeneous data without a lot of change to the overall architecture

▪ Use SparkR::spark.addFile for sending files needed in all executors

▪ SparkR::spark.lapply () fails when we have a list with more than ~46k+ elements (solved in JIRA Issue: [SPARK-25234])

Spark – lessons learned

Page 20: PowerPoint Presentation · Title: PowerPoint Presentation Author: Vitanachy Katarzyna (CR/PJ-AI-S3);Cinar Goktug (CR/PJ-AI-S1);Lade Prasanth (CR/PJ-AI-S1) Created Date: 6/7/2020 11:46:22

Large Scale Forecasting using SparkPerformance Gains

*computation time for 1893 time series

Page 21: PowerPoint Presentation · Title: PowerPoint Presentation Author: Vitanachy Katarzyna (CR/PJ-AI-S3);Cinar Goktug (CR/PJ-AI-S1);Lade Prasanth (CR/PJ-AI-S1) Created Date: 6/7/2020 11:46:22

Thank you!

Abhirup Mallik (Bosch)Abishek Prasanna (Bosch)Jeff Thompson (Bosch)Kasia Vitanachy (Bosch)Lisa Marion Garcia (Bosch)Matthew Jones (Bosch)

Nicolas Douard (Virtue Foundation)Patrick Emmerich (Bosch)Phil Gaudreau (LinkedIn)Ruobing Chen (Facebook)Sascha Vetter (Bosch)Zichu Li (University of Rochester)

Page 22: PowerPoint Presentation · Title: PowerPoint Presentation Author: Vitanachy Katarzyna (CR/PJ-AI-S3);Cinar Goktug (CR/PJ-AI-S1);Lade Prasanth (CR/PJ-AI-S1) Created Date: 6/7/2020 11:46:22

Feedback

Your feedback is important to us.

Don’t forget to rate and review the sessions.

Page 23: PowerPoint Presentation · Title: PowerPoint Presentation Author: Vitanachy Katarzyna (CR/PJ-AI-S3);Cinar Goktug (CR/PJ-AI-S1);Lade Prasanth (CR/PJ-AI-S1) Created Date: 6/7/2020 11:46:22