quality control and improvement in manufacturing
DESCRIPTION
AACIMP 2009 Summer School lecture by Gerhard Wilhelm Weber. "Modern Operational Research and Its Mathematical Methods" course.TRANSCRIPT
Quality Control and Improvement
4th International Summer SchoolAchievements and Applications of Contemporary Informatics, Mathematics and PhysicsNational University of Technology of the UkraineKiev, Ukraine, August 5-16, 2009
Quality Control and Improvement in Manufacturing
Gülser Köksal , Sinan KayalıgilDepartment of Industrial Engineering, METU, Ankara, Turkey
Gerhard-Wilhelm Weber, Başak Akteke-ÖztürkIAM, METU, Ankara, Turkey
Project Team
� Gülser Köksal (IE)� Nur Evin Özdemirel (IE)� Sinan Kayalıgil (IE)� Bülent Karasözen (MATH, IAM)� Gerhard Wilhelm Weber (IAM)� Đnci Batmaz (STAT)� Murat Caner Testik (IE)� Đlker Arif Đpekçi (IE)� Berna Bakır (IS)� Fatma Güntürkün (STAT)� Fatma Güntürkün (STAT)� Başak Öztürk (IAM)� Fatma Yerlikaya (IAM)
Other Collaborators:� Esra Karasakal (IE)� Zeev Volkovich (CS - Israel)� Adil Bagirov (AOpt - Australia)� Özge Uncu (IE- Canada)� Pakize Taylan (IAM)� Süreyya Özöğür (IAM)� Elçin Kartal (STAT)� Selcan Cansız (STAT&IE)
OUTLINE� Project Objectives� Quality Improvement (QI)� Data Mining (DM)� DM Applications in QI in Literature� DM Applications in the ProjectDM Applications in the Project
�Casting QI Problem (Decision Trees, Neural Nets, Clustering)
�Driver Seat Design Problem (Decision Trees)�PCB QI Problem (Association)
� Other approaches�Nonlinear/Robust Regression
� Conclusion
Project Objectives
�Determine which DM approaches can effectively be used in QI
�Test performance of DM approaches on selected quality design and improvement problems with especially voluminous dataproblems with especially voluminous dataand multiple input and quality characteristics
�Develop more effective approaches to solve such problems
Project Scope
�Manufacturing industries keeping records of various input and quality characteristics
�QI problems for which traditional analysis and solution approaches are ineffective and solution approaches are ineffective due to too many variables and complicated relationships
�“Parameter design optimization” and “quality analysis” type of quality problems
The Approach
� Collect appropriate data from different industries for different quality problems
� Apply appropriate DM techniques in solving those problems
� Compare performances of DM techniques� Determine which DM techniques can effectively be
used for which type of QI problems� Develop new / improved algorithms
QUALITY IMPROVEMENT PROBLEMS
Quality Control and Improvement Activities
Product development stage Quality control and improv ement activity
Product design Concept design
Parameter design (design optimization)
Tolerance design
Manufacturing process design Concept design
Parameter design (design optimization)
Tolerance design
Manufacturing Quality monitoring
Process control
Inspection / Screening
Quality analysis
Customer usage Warranty and repair / replacement
Parameter Design Optimization
INPUT
Disturbance
Mea
sure
d
Unm
easu
red
Dynamic problem:Find settings of manipulated input for changing output targets
Static problem:Find settings of manipulated input for fixed output target and minimum variability
PRODUCT/PROCESSINPUT
OUTPUT
Measured
Unmeasured
Manipulated
Mea
sure
d
Unm
easu
red
for changing output targets and minimum variability
Dynamic Manufacturing Environment
INPUT
Disturbance(assignable causes, noise)
Mea
sure
d
Unm
easu
red
Goal: to have process output within target specifications with smallest amount of variation around the target
statistical process controlto detect assignable causes(quality monitoring)
PROCESSINPUT
OUTPUT
Measured
Unmeasured
Manipulated
Mea
sure
d
Unm
easu
red
engineering process control
Static Manufacturing Environment
INPUT
Disturbance(assignable causes, noise)
Mea
sure
d
Unm
easu
red
Goal: to have process output within target specifications with smallest amount of variation around the target
Quality analysis:
measured / manipulated input → output
OUTPUTPROCESSINPUT
Measured
Unmeasured
Manipulated
Mea
sure
d
Unm
easu
red
Quality Control and Improvement Activities: Quality Analysis
Quality Analysis consists of
- Finding characteristics critical-to-quality (CTQ)- Finding input variables that significantly affect quality output
- Predicting quality- Predicting quality- quality output is a real valued variable - finding empirical models that relate input characteristics of quality to output ones - using such models to predict what the resulting quality characteristics will be for a given set of input parameters
- Classification of quality- For nominal, binary or ordinal outputs- For a given set of input parameters, predicting the class of the quality output
DATA MINING
Data Mining
�Data mining (knowledge discovery in
databases) :
� Extraction of interesting (non-trivial, implicit, previously unknown� Extraction of interesting (non-trivial, implicit, previously unknown
and potentially useful) information or patterns in large databases
�What is not data mining?
� (Deductive) query processing
� Expert systems or small ML/statistical programs
Data mining – A KDD Process
� Data mining is the core of
KDD process
Task-relevant Data
Data Selection
Data Mining
Pattern Evaluation
Data CleaningData Integration
Databases
Data Warehouse
Data Selection Data Preprocessing
Data Mining Techniques
� Supervised Learning�Classification and regression
�Decision trees
�Neural networks
�Support vector machines
�Bayesian belief networks
�Non-linear robust regression
�Rule induction
�Association rules
�Rough set theory
Data Mining Techniques
� Unsupervised Learning
�Clustering
�K-means, Fuzzy C-means, Hierarchical, Mixture of
GaussiansGaussians
�Neural Networks (Self Organizing Maps)
� Outlier and deviation detection
� Trend analysis and change detection
Some Applications
�Market research and customer relationship management
�Risk analysis and management�Fraud detection�Fraud detection�Text and web analysis�Intelligent inquiry�Process modelling�Supply chain management
Supply Chain Management Applications
�Reducing risk of accepting bad credit cards in payments through e-commerce
�Controlling inventory by analyzing past business, monitoring present transactions, and predicting future salespredicting future sales
�Controlling inventory by predicting customer’s behavior patterns (e-commerce)
�CRM (clustering customers, understanding their needs and behaviors, etc.)
Source: Kusiak, A. “Data Mining in Design of Products and Production Systems”, Proceedings in INCOM 2006, Vol.1, 49-53.
SOME DM APPLICATIONS on QI PROBLEMS
� Predicting quality for given process parameter levels � Finding optimal process parameter levels for quality� Determining effects of equipment on quality� Determining factors / parameters effects on quality� Tolerancing� Tolerancing� Identifing relationships among several quality
characteristics� Determining assignable causes that make a process
out of control (unstable) on time
Some Applications in Literature
� Integrated circuit manufacturing
� Fountain et al. (2000), Kusiak (2000)
� Packaging manufacturing
� Abajo et al. (2004)
� Semiconductor wafer manufacturing
� Gardner (2000), Kusiak (2000), Bae (2005),
� Chen (2004), Braha (2002), Hu (2004),
� Dabbas (2001), Fan (2001), Mieno (1999)
� Skinner (2002)
� Sheet metal assembly
� Lian et al. (2002)
Some Applications in Literature
� Steel production
� Cser et al. (2001)
� Chemical manufacturing
� Shi et al. (2004), Gillblad (2001)
� Sun (2003)
� Ultra-precision manufacturing
� Huang&Wu (2005)
� Conveyor belts manufacturing
� Hou et al. (2003), Hou (2004)
� Plastic manufacturing
� Ribeiro (2005)
LITERATURE SURVEY (DM Applications on Selected QI Problems)
8
10
12
14
2003
2004
2005
2006
2007
No. of papers
1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 20070
2
4
6
8
0 5 10 15 20 25
1997
1998
1999
2000
2001
2002
2003
Finding CTQs
Predicting quality
Classification of quality
Parameter optimization
Years
Literature Survey (cont.d)
DT7 GA
1
ANN- BN1
KW1
AHC1
RSM1 ANN
11 SVM2
GA1
BN1
CC1
BA1
RBF-NN
1 Finding CTQs
ANN6
R5
ANOVA5
RST3
ANN-SOM3
DT5
RST5
FST3
Classification of quality
Literature Survey (cont.d)
ANN-BN4
ANN-RBF3
ANN
ANN-RBF1
TM1
Predicting quality
ANN38
R13
DT4
FST4
GA11
ANN6
Parameter optimization
QI Problems – Examples from the Project
�Casting manufacturing�Driver seat design �Circuit board manufacturing
CASTING QUALITY IMPROVEMENT PROBLEM – The Company
�RKN is a casting company having two factories located in Ankara
� It manufactures intermediate goods for the automotive, agricultural tractor and motor automotive, agricultural tractor and motor industries
�RKN applies 6σ methodologies inimproving its processes
CASTING QUALITY IMPROVEMENT PROBLEM – Some Products
Transmission Cases Transmission Cases
Gearbox
Engine Block
Oil pan
CASTING QUALITY IMPROVEMENT PROBLEM – Some Research Questions
� Is there any relation between defect types and process parameters?
�Do the important factors for different �Do the important factors for different defect types interact?
�Which process parameter levels are better in reducing the defects?
DRIVER SEAT DESIGN OPTIMIZATION PROBLEM – The Company
�TFD is one of the largest automobile manufacturers in Turkey located in Bursa.
�They would like to improve the design of �They would like to improve the design of the driver seat of a commercial vehicle for more customer satisfaction.
�The driver seat is a critical part of an automobile that affects the buying decision.
DRIVER SEAT DESIGN OPTIMIZATION PROBLEM – The Product
DRIVER SEAT DESIGN OPTIMIZATION PROBLEM – Some Research Questions
�Which customer features do affect overall satisfaction from the seat?
�What are the characteristics of highly �What are the characteristics of highly satisfied /dissatisfied customers from the seat?
�Which features of the seat do affect overall satisfaction from the seat?
CIRCUIT BOARD QUALITY IMPROVEMENT PROBLEM – The Company
�VPC is one of the largest electronic equipment manufacturers in Turkey.
�They produce approximately 35-40 thousand PCBs per day, and 1.5-2 million thousand PCBs per day, and 1.5-2 million PCBs per month.
�70-80 thousand PCBs are scrapped every month.
�They would like to minimize PCB failures.
CIRCUIT BOARD QUALITY IMPROVEMENT PROBLEM – The Products
� Final products:�DVD player/recorder, DivX player, AV receiver, digital
satellite receiver, digital TV receiver, digital media adapter
� Component of interest:� Component of interest:�Various PCBs (Printed Circuit
Boards)= Board+Integrated Circuits+Resistors+Capacitors+Diots
CIRCUIT BOARD QUALITY IMPROVEMENT PROBLEM – Some Research Questions
�Which defect types do occur together?�What are the root causes of the defects?�What are the root causes of the defects?�Do suppliers affect the defects?�Do defects occur at certain locations on
the board?
Data Mining Software Used in the Project
�SPSS Clementine
�Matlab
�Statistica QC Miner�Statistica QC Miner
�MARS
Decision Trees
Casting Process
METU-IE and TU/e-OPAC Workshop
MOLDING LINE
FETTLING SHOP CORE SHOP
MELTING
RKN’s Quality Objectives�Decrease percentage of defective items by
choice of process parameters�Priorities:
�products suffering from high percentage of defectsdefects
�products of larger share in the total tonnage although with lower percent defectives
�Decrease percentage of products returns because of the defects determined by customers
Objectives
� Decrease the proportion of defective items (to a certaintarget value)
� Identify the most important process parameters affectingquality
� Finding the ranges of these parameters to operate(future direction)
� Optimizing the proportion of defective items (futureconsideration)
Perkins021 Cylinder Head
�Perkins 021 cylinder head is one of the two products chosen for the analysis from the second casting plantthe second casting plant
�Reason:�Having problems with Perkins�Availability of the data�Volume of the data
Cylinder Head
Data Collection�Data in RKN come from several processes
and different time periods.�Weekly �Daily�Hourly�Hourly
�Most of the data come from�Core shop�Molding�Melting
Data Collection (Cont...)
� Lot: total production in a day (one or more shifts)� Daily records consist of the total volume of
production, total count of defective products and the distribution of defect types
� Response variables recorded are: � Response variables recorded are: � total number of defective products�number of defective products for 19 defect types�number of defective products returned by the customer
(newly added)
Data of Core Shop�Cores are produced according to a
weekly production plan �Cores used for a product are ready one
or two days before use�Specific core usage in a shift cannot be �Specific core usage in a shift cannot be
identified accurately�Production may stop for a while and even
the cores from 3 or more days in the past can be put to use arbitrarily
The Data� 5 month’s production data� Number of records : 95 (averages of 95 days)� Input : real (47)� Output : discrete (8)
� Can be transformed to binary, nominal or ordinal variables if needed
� Some missing data� Some missing dataAFTER PREPROCESSING
� 6 real uncorrelated response variables (proportions of defect types) + 1 total response (proportion of defective items)
� 36 real feature (predictor) variables� 92 observations
Problem Settings
x1 x2 y1 y2126,00 135,00 1 0120,00 140,00 1 0110,00 120,00 1 0102,00 131,00 1 0130,00 125,00 1 0285,00 115,00 0 0296,00 140,00 0 0275,00 129,00 0 0
Đ
responses
kfeatures
275,00 129,00 0 0260,00 128,00 0 0280,00 105,00 0 0106,00 306,00 0 1113,00 308,00 0 1122,00 306,00 0 1128,00 329,00 0 1145,00 334,00 0 1287,00 329,00 1 1279,00 324,00 1 1291,00 335,00 1 1260,00 340,00 1 1270,00 321,00 1 1
jobs.
Univariate Modeling
Multivariate Modeling
vs
Univariate Decision Tree Methodology –CART (Continuous data)
∑∈
−=ti
i tyytN
tR 2))(()(
1)(
DECISION TREE MODEL (LEAST) SQUARE DEVIATION
IMPURITY MEASURE
)92/48(006.06%
095.39275.1322
==>>
SupportYTHEN
XANDXIF
)()()(),( RRLL tRptRptRts −−=Φ
A TYPICAL RULE GENERATED
Research Questions� Can we reduce problem dimension by extracting
important features only? � Is there any relation between defect types and process
parameters?� Do the important factors for different defect types
interact? � Are there significant changes in process parameter when
a defect rate is high or low?� Are there significant changes in process parameter when
a defect rate is high or low?� Which process parameter levels are better in reducing
the defects?� Is there any period when high defect rates occur
specifically?� Is there any pattern in the sequence of defect type
occurences?
Feature Reduction
�Feature selection�Feature selection�Decision trees�PCA
Univariate Decision Tree Methodology – Nominal data� Number of records: 748� Analysis Accuracy: 93.45%� inputs: x32, x12, x22, x13, x2, x19, x10, x9, x36, x8, x28� Tree depth: 9
� Results for output field y� Comparing $C-y with y� 'Partition' 1_Training 2_Testi ng� Correct 699 93.45% 294 92.74%� Wrong 49 6.55% 23 7.26%
Total� Total 748 317
� Coincidence Matrix for $C-y (rows show actuals)� 'Partition' = 1_Training 0.000000 1.000000 2.000000� 0.000000 49 0 3 %94.2� 1.000000 0 224 19 %92.1� 2.000000 0 27 426 %94� 'Partition' = 2_Testing 0.000000 1.000000 2.000000� 0.000000 18 0 2
1.000000 0 115 4� 2.000000 0 17 161
Conclusion of the Casting Work
�DT induced rules were instrumental in planning new controlled experiments
�Process optimization may be sought based upon these field experimentsupon these field experiments
�DT induced rules may also be used to set tolerance levels for the uncontrollable features (variables)
Suggested Factor Levels
Factor
contollable?
Adjusted Setting Observed Range
Suggested Trial Range
Pertinent Defect Types Suggested Mean Setting
x2 H [15, 30] [20, 28] [23, 28] (y2),(y3),(y6),(y8) mümkünse [23, 28]
x3 H [15, 30] [30, 40] [31, 37.5] y1,y3 mümkünse [31, 37.5]
x4 E [13, 15] [12.171, 13.678] [12.295, 13.678] y1 sabit [12.295, 13.678]
x5 E [14, 16] [12.27, 13.66] [12.27, 13.165] y8 sabit [12.27, 13.165]
x6 E [7.5, 9.5] [7.585, 8.25] [7.917, 8.25] y8 sabit [7.917, 8.25]
x8 E [35, 42] [21.75, 42] [21.75, 35] y3, (y2) sabit [21.75, 35]
x9 E [3, 3.5] [2.98, 3.387] yok y2, y3, y6, y8 3 seviye [3.183, 3.216], [3.216, 3.26], [3.26, 3.387]
x11 E [18, 23] [19.8, 22.9] [20.339, 22.9] y3 sabit [20.339, 22.9]
June 2007 METU-IE and TU/e-OPAC Workshop
x12 E [250, 400] [290, 360] [350, 360] y2 sabit [350, 360], olmazsa [305, 360]
x14 E [3.5, 5.5] [4.7, 5.2] [4.724, 5.2] y2 sabit [4.724, 5.2]
x16 H [11, 23] [13.2, 30] [15.86, 30] y1, (y2) mümkünse [15.86, 30]
x17 H [11, 23] [15.9, 31.5] [26.55, 31.5] y1 mümkünse [26.55, 31.5]
x19 H [11, 23] [14.1, 24.9] yok y2 kendi seyrine bırakılacak
x20 E 40 [38.992, 42.85] [38.992, 41.32] y3 sabit [38.992, 41.32]
x21 E 50 [48.68, 52.71] [49.181, 52.71] y9 sabit [49.181, 52.71]
x22 E28 marta kadar = 12 31 marttan sonra = 22
28 marta kadar: [10.85, 14,35] 31marttan sonra: [20.05, 33.428] yok y1,y2,y3,y6
4 seviye [10.85, 13.125], [12.275, 14.35], [14.35, 17.2], [17.2, 33.42]
x25 H aralık yok [2.5, 6.9] [2.5, 6.533] y8 mümkünse [2.5, 6.533]
x26 E [1420, 1430] [1367.59, 1428.23] [1367.59, 1425.98] y8, y9 sabit [1367.59, 1425.98]
x27 H aralık yok [2.259, 4.95] [2.259, 4.2] y2, (y3) mümkünse [2.259, 4.2]
x28 H aralık yok [11.7, 16.9] yok y3, y6 kendi seyrine bırakılacak
x29 YES [3.2, 3.35] [3.208, 3.41] NOT AVAILy1,y3,y6, y8
3 levels [3.208, 3.304], [3.304, 3.325], [3.355, 3.41]
x30 E [1.85, 2] [1.823, 2] yok y1,y2,y3 2 seviye [1.823, 1.88], [1.88, 2]
x32 E [0.2, 0.3] [0.171, 0.283] yok y1,y2 2 seviye [0.171, 0.184], [0.184, 0.283]
x33 E maximum 0.3 [0.0767, 0.552] [0.174, 0.552] y2 sabit [0.174, 0.552]
x35 E [0.08, .12] [0.0762, 0.1122] [0.088, 0.1122] y1 sabit [0.088, 0.1122]
DRIVER SEAT DESIGN OPTIMIZATION PROBLEM
� Questionnairre data� 80 observations/subjects� 28-88 input variables (age, sex, distance
travelled, anthropometric measures, ease of use, travelled, anthropometric measures, ease of use, attractives, etc.)
� 1-53 output variables (back comfort, tigh comfort, overall satisfaction, ease of use, attractiveness, etc.)
Rules for customer satisfaction� Rule for 7 / 7 (very satisfied) (support=4; confid ence=1.0)
If Lumbar ache after driving for a long time = 0 and Video gray as a seat cover design = 1 and Accept to pay more for the seat belt sensor = 0 and Adequate support by the seat cushion = 1 then 7,0 (very satisfied)
� Rule for 6 / 7 (satisfied) (support=10; confidence= 1.0)If Lumbar ache after driving for a long time = 0 and Lumbar ache after driving for a long time = 0 and Video gray as a seat cover design = 1 and Accept to pay more for the seat belt sensor = 0 then 6,0 (satisfied)
� Rule for 4 / 7 (normal) (support=8; confidence=0.7 5)If Lumbar ache after driving for a long time = 0 and Easy reach to the lumbar support adjustment =0 then4.0 (normal)
Neural Network Modeling
Neural Network Modeling - General
� A neural network (NN) is an interconnected group of artificial neurons that uses
a mathematical or computational model for information processing based on a
connectionist approach to computation.
� Incorporates learning rather than programming and parallel rather than
sequential processing.sequential processing.
� Neural networks resemble the human brain in two respects:
� The network acquires knowledge from its environment using a learning process
(algorithm)
� Synaptic weights, which are inter-neuron connection strengths, are used to store the
learned information.
General Topology
Output layerInput layer
Hidden layers
Inside the Node
� Components:
� Weights
� Base function (summing unit)
� Activation function
Biasb
� A node
� Receives n-inputs
� Compute net input according to basefunction
� Applies activation function to the netinput
� Outputs result
f(net)Inputvalues
weights
Basefunction
bActivationfunction
net Outputy
x1
x2
Xm
w2
wm
w1
∑... nodei
� Outputs result
�Capabilities� Fault tolerance� Robustness� Non-linear mapping� Learning and generalization� Optimization
�Issues� Number of source nodes
Properties
� Number of source nodes� Number of hidden layers� Number of hidden nodes per hidden layer� Training data (Too much…..overfitting, too little……inaccurate
classification)� Number of classes(sink)� Interconnections� Activation function� Learning technique� Stopping criteria
� Data:
� 36 input variable (continuous)
� 1 output variable (categorical with 3 levels – 1: first defect type exists, 2:second defect type exists, 0: none of these two defect types exist)
� Partition: Training -> 70%, Testing -> 30%
� Learning rule: Back-propagation
Application 1: Classification of quality in Casting
� Network Topology
� Input layer (36 neurons)
� Hidden layer (6 neurons)
� Output layer (1 neuron)
� To prevent overfitting, training set was divided again into training and testing set(partitioning the partition), trained on training set, and error is evaluated on thetest set at each cycle
Results
� Overall predicted accuracy
� Training: 92,56%
� Testing: 87,01%
Training 0 1 2
0 33 0 3
1 0 158 13
2 0 27 344
Testing
0 18 0 0
1 0 51 11
COINCIDENCE MATRIX FOR PREDICTED CATEGORIES
1 0 51 11
2 0 19 132
GAIN CHART
� Data:
� 36 input variable (continuous)
� 1 output variable (percentage of defectives for a certain defect type)
� Partition: Training -> 70%, Testing -> 30%
� Learning rule: Back-propagation
Application 2: Prediction of quality inCasting
Learning rule: Back-propagation
� Method: Exhaustive prune (finds the best topology)
� Final Network Topology
� Input layer (36 neurons)
� First hidden layer (25 neurons)
� Second hidden layer (17 neurons)
� Output layer (1 neuron)
Results
� Estimated accuracy: 99.95%
� Training results are slightly better thantesting results (overfitting)
Statistics
Conclusion
�Neural networks can be used for both classification and prediction
�Unlike decision trees, neural networks are black-box models
�To decide on best production regions, further study may be needed (simulation, DOE, etc).
CLUSTERING
CLUSTERING - General
Clustering of data is a method by which large sets of data is grouped into clusters of smaller sets of similar data.
The example below demonstrates the clustering of balls
we see clustering is grouping data or dividing a large data set into smaller data sets of some similarity.
Clustering Algorithms
A clustering algorithm attempts to find natural groups of components (or data) based on some similarity
Clustering algorithms find k clusters so that the objects of one cluster are similar to each other whereas objects of different clusters are dissimilar.
Taxonomy of Clustering Approaches
Hierarchical vs. Partitional
A hierarchical algorithm partitions the data set in a nestedmanner into clusters which are either disjoint or included one into another. These algorithms are either agglomerative or divisive according to the algorithmic structure and the operation they carried on.structure and the operation they carried on.
A partitional method assumes that the number of clusters to be found is already given and then it looks for the optimal partition based on the objective function.
Nonsmooth Optimization
Most cases of clustering problems are reduced to solving nonsmooth optimization problems.
Nonsmooth Optimization Problem:minimize subject to
: is nonsmooth at many points of interest � does not have a conventional derivative at these points.
A less restrictive class of assumptions for thansmoothness: convexity and Lipschitzness .
Cluster Analysis via Nonsmooth Opt.
� Given instances
� Problem:
� This is a clustering problem with the partitioning method. We will reformulate this as a nonsmooth optimization problem.
Cont’d
� k is the number of clusters (given), � m is the number of instances (given),
� is the j-th cluster’s center (to be found), � association weight of instance , cluster j (to be
found):
Cluster Analysis via Nonsmooth Opt.
found):
� ( ) is an matrix,
� objective function has many local minima.
if k is not given a priori
� Start from a small enough number of clusters k and gradually increase the number of clusters for the analysis until a certain stopping criteria is met.
� This means: If the solution of the corresponding
Cont’dCluster Analysis via Nonsmooth Opt.
� This means: If the solution of the corresponding optimization problem is not satisfactory, the decision maker needs to consider a problem with k + 1 clusters, etc..
� This implies: One needs to solve repeatedly arising optimization problems with different values of k - a task even more challenging.
Reformulated Problem:
Cont’dCluster Analysis via Nonsmooth Opt.
• A complicated objective function: nonsmooth and nonconvex. The number of variables in the reformulated nonsmooth optimization problem above is k×n, before it was (m+n)×k.
• This problem can be solved by related nonsmooth methods (e.g., Semidefinite Programming, discrete gradient method).
Clustering Analysis on RKN Casting Data
We used k-means, PAM (Partitioning Around Medoids) and k-means improved by Nonsmooth Optimization to identifyhomogenous groups in the data.
k-Means: The grouping is done by minimizing the sum of squaresof distances between data and the corresponding cluster centroid.of distances between data and the corresponding cluster centroid.
PAM: A medoid is an object of the cluster, whose average
distance to all the objects in the cluster is minimal.
k-Means improved by Nonsmooth Optimization: k-meansalgorithm that solves a nonsmooth optimization subproblem for calculating the starting point for the k-th cluster center.
Results
� k-Means: k=2, cluster 1: 70 obj., cluster 2: 22 obj.k=3, cluster 1: 68 obj., cluster 2: 22 obj., cluster 3: 2 obj.k=4, cluster 1: 68 obj., cluster 2: 16 obj., cluster 3: 6obj., cluster 4: 2 obj.
� PAM:k=2, cluster 1: 40 obj., cluster 2: 52 obj.k=2, cluster 1: 40 obj., cluster 2: 52 obj.k=3, cluster 1: 33 obj., cluster 2: 34 obj., cluster 3: 25 obj.k=4, cluster 1: 20 obj., cluster 2: 34 obj., cluster 3: 25 obj., cluster 4: 13 obj.
� k-means improved by Nonsmooth Optimization:k=2, cluster 1: 61 obj., cluster 2: 31 obj.k=3, cluster 1: 61 obj., cluster 2: 31 obj., cluster 3: 2 obj.k=4, cluster 1: 45 obj., cluster 2: 24 obj., cluster 3: 2 obj., cluster 4: 21 obj.
ResultsPAM Clusters
Total1 2 3 4
K-MeansClusters
12
200
1222
250
130
7022
Total 20 34 25 13 92
k-means improved by Nonsmooth
Optimization Clusters Total
1 2
k-MeansClusters
12
610
922
7022
Total 61 31 92
In the tables above, we showed the relations betweendifferent clustering results. Optimal partitioning with PAM is obtained for k=4, however for others k=2 gives the best results. For k=3 and k=4 with k-means, the clusters of 2
Results
results. For k=3 and k=4 with k-means, the clusters of 2 and 6 objects are artificial.
These results match with our preprocessing studies (Cathrene Sugar’s “jump method” and PCA) which suggested that k is 2 or 4 in our data.
Jump Method and PCATr
ansf
orm
ed d
isto
rtio
nTr
ansf
orm
ed d
isto
rtio
n
Cluster
Association Rule Mining
Association Analysis� Association rule mining searches for interesting
relationships among the features in a given dataset.
� A typical example of association rule mining is“market basket analysis”.
� This process analyzes customer buying habits by� This process analyzes customer buying habits byfinding associations between the different itemsthat customers place in their “shopping baskets”
Support and Confidence• Association rules are statements in the form of
IF antecedent(s) THEN consequent(s)
where antecedent(s) and consequent(s) are disjoint conjunctions of feature-value pairs.
• Two common measures, support and confidence, are used to evaluate extracted rulesto evaluate extracted rules
• For a rule defined as X=>Y• The support of the rule is the joint probability of X and Y,
Pr(X and Y).
• The confidence of the rule is the conditional probability of Y given X, Pr(Y|X)
PCB Assembly Line
PCB Assembly Line (Cont.)
PCB Manufacturing Data in Transactional Format
� In this format, a single board can be seen in more than one rows, each of which represent different operation performed on this product
� Serial number can be used as the transaction ID which distinguishes different products
� Attributes (variables) of the boards:� Product type� Product type� Description of the failure (failure observed during the final
electrical test)� Root cause (cause of the failure identified during the repair) � Location of the root cause� Board type� Supplier� Operation line failure is detected� Date and time
Attributes
� 11 types of PCB� 38 possible failures (e.g., display error, software
error, no audio, etc.)� 13 possible root causes (e.g., chip without solder,
resistance is upright, short circuit, etc.)resistance is upright, short circuit, etc.)� Location of the root cause on the board� 9 board types� 6 different suppliers
Application: PCB Manufacturing
Board Type serial supplier Failure reason-of-failure Loc ation1 2459 GOODBOARD display error no solder U45 6.PIN 1 736 TATCHUN-GIA TZOONG AUX1 error short circuit U8 2.PIN
� Sample records from PCB manufacturing data
4 990 GIA TZOONG device-not-work sw L71 3 700 TATCHUN-GIA TZOONG display error short circuit R407 6 712 ÜNAL ELEKTRONĐK rgb-cvbs error flash error R412 2 1411 GOODBOARD sw error upright K23 2 663 GOODBOARD-TATCHUN AUX1 error no solder C130 7 627 UNIWELL ELECTRONIC audio error upside-down B353 4 1169 GOODBOARD sw error sw U6
Possible Applications of Association Analysis
� Identifying failure types taken place on the same board together.
�Association of failures with root cause.�Association of failures with suppliers.� Identifying failures occuring in sequence.�Association of failures with the location of
the root cause on the board
Identifying failure types occured on the same board together
�“device-not-functioning” => “flash-not-loading” (%25, %73)
�“flash-not-loading”=> “display error” �“flash-not-loading”=> “display error” (%36, %86)
�“AUX1 error” AND “feed error” => “ audio error” (%32, %61)
Association of failures with root causes
�“upright” AND “Location” = Chip => “audio error” (%46, %82)
�“no solder” => “device-not-functioning” (%18, %100)
Association of failures with suppliers
�“GOODBOARD” => “display error” (%23, %57)
�“UNIWELL” AND “GOODBOARD” => “feed error” (%18, %53)
Identifying failures dependent on the sequence of operations
�Line 1 = “AUX1 error” => Line 5 = “feed error” (% 22, % 48)
Association of failures with the location of the root cause on the board
�“device-not-functioning” => Location = “resistance” (%56, %76)
�“flash-not-loading” => Location = “U8 2.PIN” (%43, %66)
RegressionRegression
Regression Approaches
� MULTIPLE LINEAR REGRESSION (MLR)
� NONLINEAR REGRESSION (NLR)
� GENERAL LINEAR MODELS (GLM)
� GENERALIZED LINEAR MODELS (GLZ)� GENERALIZED LINEAR MODELS (GLZ)
� ADDITIVE MODELS
� GENERALIZED ADDITIVE MODELS (GAM)
� ROBUST REGRESSION
CONCLUSION
� Tough QI problems with several input and output variables can be handled effectively with DM approaches.
� Observational or experimental data, preferentially voluminous data are needed.voluminous data are needed.
� Online data collection systems might need to be installed
� Data quality and pre-processing are crucial� Many tools seem to be difficult to apply in practice for
industry people (advanced training might be necessary)� Results in the form of rules are found useful and
interesting by the industry
FUTURE WORK
� Continue collecting different data sets for different QI problems, and applications on them
� Also apply other DM approaches such as linear / robust regression, fuzzy clustering / regression and rough set theory.
� Compare performances.� Compare performances.� Develop new / improved DM algorithms for solving
the QI problems.� Multi-response decision tree modeling� Non-smooth optimization for categorical quality
responses� Improved MARS with Tikhonov regularization
PAPERS AND PRESENTATIONS FROM THE PROJECT
Bakır, B., Batmaz, Đ., Güntürkün, F.A., Đpekçi, Đ.A., Köksal, G., and Özdemirel, N.E., Defect Cause Modeling with Decision Tree and Regression Analysis, Proceedings of XVII. International Conference on Co mputer and Information Science and Engineering , Cairo, Egypt, December 08-10, 2006, Volume 17, pp. 266-269, ISBN 975-00803-7-8.
Đpekçi, A.Đ., Bakır, B., Batmaz, Đ., Testik, M.C., and Özdemirel, N.E., Defect Cause Modeling with Data Mining: Decision Trees and Neural Networks, to Cause Modeling with Data Mining: Decision Trees and Neural Networks, to appear in Proceedings of 56th Session of the 1st Internationa l Statistical Institute , Lisbon, Potugal, August 22-29, 2007.
Akteke-Öztürk, B. and Weber, G. W., "A Survey and Results on Semidefinite and Nonsmooth Optimization for Minimum Sum of Squared Distances Problem", Technical Report , 2007.
Öztürk-Akteke, B., Weber, G.W., Kayalıgil, S., Kalite Đyileştirmede Veri Kümeleme: Döküm Endüstrisinde Bir Uygulama, Yöneylem Ara ştırması ve Endüstri Mühendisli ği 27. Ulusal Kongresi (YA/EM 2007), Đzmir, Türkiye, Temmuz 02-04, 2007.
PAPERS AND PRESENTATIONS FROM THE PROJECT (cont.d)
Session TC-38: Tutorial Session: Data Mining Applications in Quality Improvement22nd European Conference on Operational Research, Prague, July 7-11, 2007
Köksal, G., Testik, M.C., Güntürkün, F.A., Batmaz, Đ., Data Mining Applications in Quality Improvement: A Tutorial and a Literature Review
Đpekçi, A.Đ., Köksal, G., Karasakal, E., Özdemirel, N.E., Testik, M.C., Multi Response Decision Tree Approach Applied To A Discrete Manufacturing Quality Improvement Problem
PAPERS AND PRESENTATIONS FROM THE PROJECT (cont.d)
Köksal, G., Testik, M.C., Güntürkün, F.A., Batmaz, Đ., Kalite Đyileştirmede Veri Madenciliği Yaklaşımları ve Bir Uygulama, 16th National Quality Congress , November 12, 2007, Đstanbul.