j&j thesis presentation july 2016
TRANSCRIPT
![Page 1: J&J Thesis Presentation July 2016](https://reader031.vdocuments.us/reader031/viewer/2022022414/5873a59b1a28aba3548b4cc9/html5/thumbnails/1.jpg)
Machine Learning Techniques In Categorical Time Series Analysis
Of Manufacturing Process
Haris Michailidis, Isidora Tourni
National Technical University of AthensSchool of Electrical and Computer Engineering
Professor: Nectarios KozirisJ&J Responsible: Michalis AvgoulisPresentation Date: 26/07/2016
![Page 2: J&J Thesis Presentation July 2016](https://reader031.vdocuments.us/reader031/viewer/2022022414/5873a59b1a28aba3548b4cc9/html5/thumbnails/2.jpg)
Contents
● Problem Motivation
● Visualization
● Machine Learning
● Results
● Future Work
2
![Page 3: J&J Thesis Presentation July 2016](https://reader031.vdocuments.us/reader031/viewer/2022022414/5873a59b1a28aba3548b4cc9/html5/thumbnails/3.jpg)
Introduction
3
![Page 4: J&J Thesis Presentation July 2016](https://reader031.vdocuments.us/reader031/viewer/2022022414/5873a59b1a28aba3548b4cc9/html5/thumbnails/4.jpg)
Problem Motivation
In cooperation with Johnson & Johnson Hellas
Goals:
● Visualisation of Mixing Process● Quantification of Procedures● Classification & Clustering of processes
Further Goals:
● Optimization of the Mixing Process● Comparison with Golden Standard (Evaluation)● Comparison between different batches of the same Product
4
![Page 5: J&J Thesis Presentation July 2016](https://reader031.vdocuments.us/reader031/viewer/2022022414/5873a59b1a28aba3548b4cc9/html5/thumbnails/5.jpg)
Process Description
Example Vessel Actions:
● Heating● Agitation● Addition of Materials ● Pressure adjustment
PLC logging
● Output to CSV
MixingRaw Materials Bottling
5
![Page 6: J&J Thesis Presentation July 2016](https://reader031.vdocuments.us/reader031/viewer/2022022414/5873a59b1a28aba3548b4cc9/html5/thumbnails/6.jpg)
Product Categories
6
Emulsion
Product Cleaning Group
![Page 7: J&J Thesis Presentation July 2016](https://reader031.vdocuments.us/reader031/viewer/2022022414/5873a59b1a28aba3548b4cc9/html5/thumbnails/7.jpg)
Product Categories
7
Picsou C
Product Group
Apple Cream
![Page 8: J&J Thesis Presentation July 2016](https://reader031.vdocuments.us/reader031/viewer/2022022414/5873a59b1a28aba3548b4cc9/html5/thumbnails/8.jpg)
Data-Set Description ~130.000 rows/year
45 message code sets (values,set-points)
8
Categorical Data
![Page 9: J&J Thesis Presentation July 2016](https://reader031.vdocuments.us/reader031/viewer/2022022414/5873a59b1a28aba3548b4cc9/html5/thumbnails/9.jpg)
Visualization
9
![Page 10: J&J Thesis Presentation July 2016](https://reader031.vdocuments.us/reader031/viewer/2022022414/5873a59b1a28aba3548b4cc9/html5/thumbnails/10.jpg)
Visualization Tool
Goals:
● Visualization of Mixing Process● Selective representation of variables● Overview with flexible Timeframe● Accessible from multiple terminals (web interface)
10
The human brain processes visuals 60.000 timesfaster than text. *
* Forrester CSO Insights 2012
![Page 11: J&J Thesis Presentation July 2016](https://reader031.vdocuments.us/reader031/viewer/2022022414/5873a59b1a28aba3548b4cc9/html5/thumbnails/11.jpg)
Visualization Tool (interface)
11
Initial Page of the Visualisation Tool
![Page 12: J&J Thesis Presentation July 2016](https://reader031.vdocuments.us/reader031/viewer/2022022414/5873a59b1a28aba3548b4cc9/html5/thumbnails/12.jpg)
Visualization Tool (interface)
12
4 days overview
![Page 13: J&J Thesis Presentation July 2016](https://reader031.vdocuments.us/reader031/viewer/2022022414/5873a59b1a28aba3548b4cc9/html5/thumbnails/13.jpg)
Visualization Tool (interface)
13
1 day overview
![Page 14: J&J Thesis Presentation July 2016](https://reader031.vdocuments.us/reader031/viewer/2022022414/5873a59b1a28aba3548b4cc9/html5/thumbnails/14.jpg)
Visualization Tool (interface)
14
Detailed box in complex visualization
![Page 15: J&J Thesis Presentation July 2016](https://reader031.vdocuments.us/reader031/viewer/2022022414/5873a59b1a28aba3548b4cc9/html5/thumbnails/15.jpg)
Machine Learning
15
![Page 16: J&J Thesis Presentation July 2016](https://reader031.vdocuments.us/reader031/viewer/2022022414/5873a59b1a28aba3548b4cc9/html5/thumbnails/16.jpg)
Goal: Explore the possibilities of Machine Learning in Manufacturing space, in order to produce useful insights for the process.
● Classification● Clustering
Challenges: Represent an object in an N-dimensional space
● Representation of each batch | Object Creation● Data Cleansing / Creation of training set | Labelling● “Distance” between batches | Distance Calculation
Inspiration:
● DNA sequence analysis → Markov Models
Machine Learning Introduction
16
Unit of analysis: batch
![Page 17: J&J Thesis Presentation July 2016](https://reader031.vdocuments.us/reader031/viewer/2022022414/5873a59b1a28aba3548b4cc9/html5/thumbnails/17.jpg)
1. Data cleansing
Value - Set-Point Flattening, Typos Correction
2. Labelling of batches
Through 2 files:
● Log file: containing manual entries from operators● Mapping table: containing information for each product
3. Time-series splitting to batches
Depending on business rules which derived from experience and observations. Keep only production chunks.
Solving the Challenges
17
![Page 18: J&J Thesis Presentation July 2016](https://reader031.vdocuments.us/reader031/viewer/2022022414/5873a59b1a28aba3548b4cc9/html5/thumbnails/18.jpg)
Solving the Challenges
4. Feature selection
Message Number
5. Representation of each batch 6. Unequal length time-series comparison
18
Message Mapping Table
![Page 19: J&J Thesis Presentation July 2016](https://reader031.vdocuments.us/reader031/viewer/2022022414/5873a59b1a28aba3548b4cc9/html5/thumbnails/19.jpg)
Transition Matrix Concept
19
Sequence 1 :
B-B-C-A-B-C-A-C-A-B-C
Sequence 2 :
A-A-B-B-A-B-B-C-C-A-B-B-C-A-B-C-A-C
A B C
A 0.00 0.66 0.33
B 0.00 0.25 0.75
C 1.00 0.00 0.00
A B C
A 0.17 0.67 0.17
B 0.14 0.43 0.43
C 0.75 0.00 0.25
Transition Matrix 1 : Transition Matrix 2 :
![Page 20: J&J Thesis Presentation July 2016](https://reader031.vdocuments.us/reader031/viewer/2022022414/5873a59b1a28aba3548b4cc9/html5/thumbnails/20.jpg)
Solving the Challenges
4. Feature selection
Message Number
5. Representation of each batch 6. Unequal length time-series comparison
Chunk Object, containing:● Transition Matrix (fixed size 45x45)● Labels
7. Distance calculation method
Great research area
20
Transition Matrix
Message Mapping Table
![Page 21: J&J Thesis Presentation July 2016](https://reader031.vdocuments.us/reader031/viewer/2022022414/5873a59b1a28aba3548b4cc9/html5/thumbnails/21.jpg)
Distance Evaluation
Goal
● Distance {batch - batch} → Distance between 2D Transition Matrices
Problems:
1. Choosing the proper Vector Distance Metric2. Converting 2D Transition Matrix → Vector
Solutions:
1. Distance between Vectors:● Euclidean Distance● Cosine Distance ● Kullback- Leibler Divergence ● Kolmogorov- Smirnov Test● Infinite Norm
21
![Page 22: J&J Thesis Presentation July 2016](https://reader031.vdocuments.us/reader031/viewer/2022022414/5873a59b1a28aba3548b4cc9/html5/thumbnails/22.jpg)
Distance Evaluation2. 2D Matrix → Vector: *
A. Append each row to the firstB. Append each row from the diagonal matrix to the firstC. Average of distances between corresponding rows
A.
B.
* Not using Space-Filling curves due to unrelated spatial characteristics.22
![Page 23: J&J Thesis Presentation July 2016](https://reader031.vdocuments.us/reader031/viewer/2022022414/5873a59b1a28aba3548b4cc9/html5/thumbnails/23.jpg)
Classification (supervised)
The process of classifying objects accordingto shared attributes.
Algorithms used:
● Nearest Centroid● k-Nearest Neighbors
Evaluation Methods:
● Accuracy ● Cohen’s Kappa (Kappa coefficient)
23
train
test
Dat
a
![Page 24: J&J Thesis Presentation July 2016](https://reader031.vdocuments.us/reader031/viewer/2022022414/5873a59b1a28aba3548b4cc9/html5/thumbnails/24.jpg)
Clustering (unsupervised)
The task of grouping objects in such way that objects in the same group (cluster) are more similar to each other than to those in other groups.
Algorithms used:
● k-Means
Evaluation Methods:
● V-Measure● Rand-Index
24
![Page 25: J&J Thesis Presentation July 2016](https://reader031.vdocuments.us/reader031/viewer/2022022414/5873a59b1a28aba3548b4cc9/html5/thumbnails/25.jpg)
Classification Results
25
![Page 26: J&J Thesis Presentation July 2016](https://reader031.vdocuments.us/reader031/viewer/2022022414/5873a59b1a28aba3548b4cc9/html5/thumbnails/26.jpg)
Distance Comparison | Classification
26
![Page 27: J&J Thesis Presentation July 2016](https://reader031.vdocuments.us/reader031/viewer/2022022414/5873a59b1a28aba3548b4cc9/html5/thumbnails/27.jpg)
Nearest Centroid Classifier
27
Train - Test Split Evaluation [1/2]
Classification Baseline (ZeroR):Product Cleaning Group Accuracy: 0.520
Product Group Accuracy: 0.377
83%
65%
![Page 28: J&J Thesis Presentation July 2016](https://reader031.vdocuments.us/reader031/viewer/2022022414/5873a59b1a28aba3548b4cc9/html5/thumbnails/28.jpg)
28
k-Nearest Neighbors Classifier
Train - Test Split Evaluation [2/2]
Classification Baseline (ZeroR):Product Cleaning Group Accuracy: 0.520
Product Group Accuracy: 0.377
73%
55%
![Page 29: J&J Thesis Presentation July 2016](https://reader031.vdocuments.us/reader031/viewer/2022022414/5873a59b1a28aba3548b4cc9/html5/thumbnails/29.jpg)
Clustering Results
29
![Page 30: J&J Thesis Presentation July 2016](https://reader031.vdocuments.us/reader031/viewer/2022022414/5873a59b1a28aba3548b4cc9/html5/thumbnails/30.jpg)
Distance Comparison | Clustering
30
33%
![Page 31: J&J Thesis Presentation July 2016](https://reader031.vdocuments.us/reader031/viewer/2022022414/5873a59b1a28aba3548b4cc9/html5/thumbnails/31.jpg)
Conclusions
1) Visualizationa) Visual Production Overviewb) Enabling Comparison between batches
2) Machine Learninga) Valid Representation of Categorical Time-Seriesb) Quantification of Production Processesc) Application of Machine Learning Techniques
31
![Page 32: J&J Thesis Presentation July 2016](https://reader031.vdocuments.us/reader031/viewer/2022022414/5873a59b1a28aba3548b4cc9/html5/thumbnails/32.jpg)
Future Work | Academic
● Research on 2D-specific Distance Metrics● Clustering Algorithms, based on Markov Models● Classification using Transition Matrices of different Dimensions (Markov-
0,2,...,N)● Different Feature Selection (temperature, pressure, etc)
32
![Page 33: J&J Thesis Presentation July 2016](https://reader031.vdocuments.us/reader031/viewer/2022022414/5873a59b1a28aba3548b4cc9/html5/thumbnails/33.jpg)
● Data Gathering Automation● Creation of Golden Standard for each Product● Scoring of Production Process● Distribution of Batches compared to the Average Batch● Clustering to more efficient clusters based on the process
Future Work | Business
33
![Page 34: J&J Thesis Presentation July 2016](https://reader031.vdocuments.us/reader031/viewer/2022022414/5873a59b1a28aba3548b4cc9/html5/thumbnails/34.jpg)
Thank you!
34
Questions?
![Page 35: J&J Thesis Presentation July 2016](https://reader031.vdocuments.us/reader031/viewer/2022022414/5873a59b1a28aba3548b4cc9/html5/thumbnails/35.jpg)
Appendix
35
![Page 36: J&J Thesis Presentation July 2016](https://reader031.vdocuments.us/reader031/viewer/2022022414/5873a59b1a28aba3548b4cc9/html5/thumbnails/36.jpg)
Distance Comparison | Classification
36
Setup:● Algorithm:
○ Nearest Centroid Classifier● Attributes:
○ Product Cleaning Group○ Product Group
● Split: ○ 80% training set, 20% test set
● Distances:○ All
![Page 37: J&J Thesis Presentation July 2016](https://reader031.vdocuments.us/reader031/viewer/2022022414/5873a59b1a28aba3548b4cc9/html5/thumbnails/37.jpg)
Determining k in k-Nearest Neighbors
37
Setup:● Algorithm:
○ k-Nearest Neighbors● Attributes:
○ Product Cleaning Group○ Product Group
● Split: ○ 80% training set, 20% test set
● Distances (Average of):○ Euclidean total○ Cosine vector○ KL - Divergence diagonal
![Page 38: J&J Thesis Presentation July 2016](https://reader031.vdocuments.us/reader031/viewer/2022022414/5873a59b1a28aba3548b4cc9/html5/thumbnails/38.jpg)
Train - Test Split Evaluation
38
Setup:● Algorithm:
○ Nearest Centroid Classifier○ k-Nearest Neighbors
● Attributes:○ Product Cleaning Group○ Product Group
● Split (train-test): ○ 80% - 20%○ 65% - 35%○ 50% - 50%
● Distances (Average of):○ Euclidean total○ Cosine vector○ KL - Divergence diagonal
![Page 39: J&J Thesis Presentation July 2016](https://reader031.vdocuments.us/reader031/viewer/2022022414/5873a59b1a28aba3548b4cc9/html5/thumbnails/39.jpg)
Distance Comparison | Clustering
39
Setup:● Algorithm:
○ Baseline○ k-Means
● Attributes:○ Product Cleaning Group○ Product Group
● Initial Centroid Sets Type: ○ All centroids of each set belong to different clusters (Alldiff)
Average of 20 sets○ All centroids of each set belong to the same cluster (Allsame)
Average of 20 sets● Distances:
○ All
![Page 40: J&J Thesis Presentation July 2016](https://reader031.vdocuments.us/reader031/viewer/2022022414/5873a59b1a28aba3548b4cc9/html5/thumbnails/40.jpg)
Impact of Initial Centroids
40
Setup:● Algorithm:
○ Baseline○ k-Means
● Attributes:○ Product Cleaning Group○ Product Group
● Initial Centroid Sets Type: ○ All centroids of each set belong to different clusters (Alldiff)
Average of 100 sets○ All centroids of each set belong to the same cluster (Allsame)
Average of 100 sets○ All centroids of each set belong to a random cluster (Allrandom)
Average of 100 sets● Distances (Average of):
○ Euclidean Total○ Euclidean Rowl○ Euclidean Column
![Page 41: J&J Thesis Presentation July 2016](https://reader031.vdocuments.us/reader031/viewer/2022022414/5873a59b1a28aba3548b4cc9/html5/thumbnails/41.jpg)
Determining k in k-Nearest Neighbors [1/2]
41
Accuracy: Average: 0.727 Deviation: <1% Kappa: Average: 0.531 Deviation: ~2%
![Page 42: J&J Thesis Presentation July 2016](https://reader031.vdocuments.us/reader031/viewer/2022022414/5873a59b1a28aba3548b4cc9/html5/thumbnails/42.jpg)
Determining k in k-Nearest Neighbors [2/2]
42
Accuracy: Average: 0.560 Deviation: <1% Kappa: Average: 0.391 Deviation: ~1%
![Page 43: J&J Thesis Presentation July 2016](https://reader031.vdocuments.us/reader031/viewer/2022022414/5873a59b1a28aba3548b4cc9/html5/thumbnails/43.jpg)
Distance Comparison | Classification [2/2]
43
![Page 44: J&J Thesis Presentation July 2016](https://reader031.vdocuments.us/reader031/viewer/2022022414/5873a59b1a28aba3548b4cc9/html5/thumbnails/44.jpg)
Distance Comparison | Clustering [2/2]
44
![Page 45: J&J Thesis Presentation July 2016](https://reader031.vdocuments.us/reader031/viewer/2022022414/5873a59b1a28aba3548b4cc9/html5/thumbnails/45.jpg)
Impact of Initial Centroids [2/2]
45
![Page 46: J&J Thesis Presentation July 2016](https://reader031.vdocuments.us/reader031/viewer/2022022414/5873a59b1a28aba3548b4cc9/html5/thumbnails/46.jpg)
Labelling
1. Data cleansing 2. Labelling of batches
Object Creation
3. Time-series splitting to batches 4. Representation of each batch (chunk) 5. Feature selection 6. Unequal length time-series comparison
Distance Calculation
7. Distance calculation method
Challenges in ML
46
![Page 47: J&J Thesis Presentation July 2016](https://reader031.vdocuments.us/reader031/viewer/2022022414/5873a59b1a28aba3548b4cc9/html5/thumbnails/47.jpg)
Impact of Initial Centroids
47