quality control and improvement in manufacturing

Quality Control and Improvement

4th International Summer SchoolAchievements and Applications of Contemporary Informatics, Mathematics and PhysicsNational University of Technology of the UkraineKiev, Ukraine, August 5-16, 2009

Quality Control and Improvement in Manufacturing

Gülser Köksal , Sinan KayalıgilDepartment of Industrial Engineering, METU, Ankara, Turkey

Gerhard-Wilhelm Weber, Başak Akteke-ÖztürkIAM, METU, Ankara, Turkey

Project Team

� Gülser Köksal (IE)� Nur Evin Özdemirel (IE)� Sinan Kayalıgil (IE)� Bülent Karasözen (MATH, IAM)� Gerhard Wilhelm Weber (IAM)� Đnci Batmaz (STAT)� Murat Caner Testik (IE)� Đlker Arif Đpekçi (IE)� Berna Bakır (IS)� Fatma Güntürkün (STAT)� Fatma Güntürkün (STAT)� Başak Öztürk (IAM)� Fatma Yerlikaya (IAM)

Other Collaborators:� Esra Karasakal (IE)� Zeev Volkovich (CS - Israel)� Adil Bagirov (AOpt - Australia)� Özge Uncu (IE- Canada)� Pakize Taylan (IAM)� Süreyya Özöğür (IAM)� Elçin Kartal (STAT)� Selcan Cansız (STAT&IE)

OUTLINE� Project Objectives� Quality Improvement (QI)� Data Mining (DM)� DM Applications in QI in Literature� DM Applications in the ProjectDM Applications in the Project

�Casting QI Problem (Decision Trees, Neural Nets, Clustering)

�Driver Seat Design Problem (Decision Trees)�PCB QI Problem (Association)

� Other approaches�Nonlinear/Robust Regression

� Conclusion

Project Objectives

�Determine which DM approaches can effectively be used in QI

�Test performance of DM approaches on selected quality design and improvement problems with especially voluminous dataproblems with especially voluminous dataand multiple input and quality characteristics

�Develop more effective approaches to solve such problems

Project Scope

�Manufacturing industries keeping records of various input and quality characteristics

�QI problems for which traditional analysis and solution approaches are ineffective and solution approaches are ineffective due to too many variables and complicated relationships

�“Parameter design optimization” and “quality analysis” type of quality problems

The Approach

� Collect appropriate data from different industries for different quality problems

� Apply appropriate DM techniques in solving those problems

� Compare performances of DM techniques� Determine which DM techniques can effectively be

used for which type of QI problems� Develop new / improved algorithms

QUALITY IMPROVEMENT PROBLEMS

Quality Control and Improvement Activities

Product development stage Quality control and improv ement activity

Product design Concept design

Parameter design (design optimization)

Tolerance design

Manufacturing process design Concept design

Parameter design (design optimization)

Tolerance design

Manufacturing Quality monitoring

Process control

Inspection / Screening

Quality analysis

Customer usage Warranty and repair / replacement

Parameter Design Optimization

INPUT

Disturbance

Mea

sure

d

Unm

easu

red

Dynamic problem:Find settings of manipulated input for changing output targets

Static problem:Find settings of manipulated input for fixed output target and minimum variability

PRODUCT/PROCESSINPUT

OUTPUT

Measured

Unmeasured

Manipulated

Mea

sure

d

Unm

easu

red

for changing output targets and minimum variability

Dynamic Manufacturing Environment

INPUT

Disturbance(assignable causes, noise)

Mea

sure

d

Unm

easu

red

Goal: to have process output within target specifications with smallest amount of variation around the target

statistical process controlto detect assignable causes(quality monitoring)

PROCESSINPUT

OUTPUT

Measured

Unmeasured

Manipulated

Mea

sure

d

Unm

easu

red

engineering process control

Static Manufacturing Environment

INPUT

Disturbance(assignable causes, noise)

Mea

sure

d

Unm

easu

red

Goal: to have process output within target specifications with smallest amount of variation around the target

Quality analysis:

measured / manipulated input → output

OUTPUTPROCESSINPUT

Measured

Unmeasured

Manipulated

Mea

sure

d

Unm

easu

red

Quality Control and Improvement Activities: Quality Analysis

Quality Analysis consists of

- Finding characteristics critical-to-quality (CTQ)- Finding input variables that significantly affect quality output

- Predicting quality- Predicting quality- quality output is a real valued variable - finding empirical models that relate input characteristics of quality to output ones - using such models to predict what the resulting quality characteristics will be for a given set of input parameters

- Classification of quality- For nominal, binary or ordinal outputs- For a given set of input parameters, predicting the class of the quality output

DATA MINING

Data Mining

�Data mining (knowledge discovery in

databases) :

� Extraction of interesting (non-trivial, implicit, previously unknown� Extraction of interesting (non-trivial, implicit, previously unknown

and potentially useful) information or patterns in large databases

�What is not data mining?

� (Deductive) query processing

� Expert systems or small ML/statistical programs

Data mining – A KDD Process

� Data mining is the core of

KDD process

Task-relevant Data

Data Selection

Data Mining

Pattern Evaluation

Data CleaningData Integration

Databases

Data Warehouse

Data Selection Data Preprocessing

Data Mining Techniques

� Supervised Learning�Classification and regression

�Decision trees

�Neural networks

�Support vector machines

�Bayesian belief networks

�Non-linear robust regression

�Rule induction

�Association rules

�Rough set theory

Data Mining Techniques

� Unsupervised Learning

�Clustering

�K-means, Fuzzy C-means, Hierarchical, Mixture of

GaussiansGaussians

�Neural Networks (Self Organizing Maps)

� Outlier and deviation detection

� Trend analysis and change detection

Some Applications

�Market research and customer relationship management

�Risk analysis and management�Fraud detection�Fraud detection�Text and web analysis�Intelligent inquiry�Process modelling�Supply chain management

Supply Chain Management Applications

�Reducing risk of accepting bad credit cards in payments through e-commerce

�Controlling inventory by analyzing past business, monitoring present transactions, and predicting future salespredicting future sales

�Controlling inventory by predicting customer’s behavior patterns (e-commerce)

�CRM (clustering customers, understanding their needs and behaviors, etc.)

Source: Kusiak, A. “Data Mining in Design of Products and Production Systems”, Proceedings in INCOM 2006, Vol.1, 49-53.

SOME DM APPLICATIONS on QI PROBLEMS

� Predicting quality for given process parameter levels � Finding optimal process parameter levels for quality� Determining effects of equipment on quality� Determining factors / parameters effects on quality� Tolerancing� Tolerancing� Identifing relationships among several quality

characteristics� Determining assignable causes that make a process

out of control (unstable) on time

Some Applications in Literature

� Integrated circuit manufacturing

� Fountain et al. (2000), Kusiak (2000)

� Packaging manufacturing

� Abajo et al. (2004)

� Semiconductor wafer manufacturing

� Gardner (2000), Kusiak (2000), Bae (2005),

� Chen (2004), Braha (2002), Hu (2004),

� Dabbas (2001), Fan (2001), Mieno (1999)

� Skinner (2002)

� Sheet metal assembly

� Lian et al. (2002)

Some Applications in Literature

� Steel production

� Cser et al. (2001)

� Chemical manufacturing

� Shi et al. (2004), Gillblad (2001)

� Sun (2003)

� Ultra-precision manufacturing

� Huang&Wu (2005)

� Conveyor belts manufacturing

� Hou et al. (2003), Hou (2004)

� Plastic manufacturing

� Ribeiro (2005)

LITERATURE SURVEY (DM Applications on Selected QI Problems)

8

10

12

14

2003

2004

2005

2006

2007

No. of papers

1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 20070

2

4

6

8

0 5 10 15 20 25

1997

1998

1999

2000

2001

2002

2003

Finding CTQs

Predicting quality

Classification of quality

Parameter optimization

Years

Literature Survey (cont.d)

DT7 GA

1

ANN- BN1

KW1

AHC1

RSM1 ANN

11 SVM2

GA1

BN1

CC1

BA1

RBF-NN

1 Finding CTQs

ANN6

R5

ANOVA5

RST3

ANN-SOM3

DT5

RST5

FST3

Classification of quality

Literature Survey (cont.d)

ANN-BN4

ANN-RBF3

ANN

ANN-RBF1

TM1

Predicting quality

ANN38

R13

DT4

FST4

GA11

ANN6

Parameter optimization

QI Problems – Examples from the Project

�Casting manufacturing�Driver seat design �Circuit board manufacturing

CASTING QUALITY IMPROVEMENT PROBLEM – The Company

�RKN is a casting company having two factories located in Ankara

� It manufactures intermediate goods for the automotive, agricultural tractor and motor automotive, agricultural tractor and motor industries

�RKN applies 6σ methodologies inimproving its processes

CASTING QUALITY IMPROVEMENT PROBLEM – Some Products

Transmission Cases Transmission Cases

Gearbox

Engine Block

Oil pan

CASTING QUALITY IMPROVEMENT PROBLEM – Some Research Questions

� Is there any relation between defect types and process parameters?

�Do the important factors for different �Do the important factors for different defect types interact?

�Which process parameter levels are better in reducing the defects?

DRIVER SEAT DESIGN OPTIMIZATION PROBLEM – The Company

�TFD is one of the largest automobile manufacturers in Turkey located in Bursa.

�They would like to improve the design of �They would like to improve the design of the driver seat of a commercial vehicle for more customer satisfaction.

�The driver seat is a critical part of an automobile that affects the buying decision.

DRIVER SEAT DESIGN OPTIMIZATION PROBLEM – The Product

DRIVER SEAT DESIGN OPTIMIZATION PROBLEM – Some Research Questions

�Which customer features do affect overall satisfaction from the seat?

�What are the characteristics of highly �What are the characteristics of highly satisfied /dissatisfied customers from the seat?

�Which features of the seat do affect overall satisfaction from the seat?

CIRCUIT BOARD QUALITY IMPROVEMENT PROBLEM – The Company

�VPC is one of the largest electronic equipment manufacturers in Turkey.

�They produce approximately 35-40 thousand PCBs per day, and 1.5-2 million thousand PCBs per day, and 1.5-2 million PCBs per month.

�70-80 thousand PCBs are scrapped every month.

�They would like to minimize PCB failures.

CIRCUIT BOARD QUALITY IMPROVEMENT PROBLEM – The Products

� Final products:�DVD player/recorder, DivX player, AV receiver, digital

satellite receiver, digital TV receiver, digital media adapter

� Component of interest:� Component of interest:�Various PCBs (Printed Circuit

Boards)= Board+Integrated Circuits+Resistors+Capacitors+Diots

CIRCUIT BOARD QUALITY IMPROVEMENT PROBLEM – Some Research Questions

�Which defect types do occur together?�What are the root causes of the defects?�What are the root causes of the defects?�Do suppliers affect the defects?�Do defects occur at certain locations on

the board?

Data Mining Software Used in the Project

�SPSS Clementine

�Matlab

�Statistica QC Miner�Statistica QC Miner

�MARS

Decision Trees

Casting Process

METU-IE and TU/e-OPAC Workshop

MOLDING LINE

FETTLING SHOP CORE SHOP

MELTING

RKN’s Quality Objectives�Decrease percentage of defective items by

choice of process parameters�Priorities:

�products suffering from high percentage of defectsdefects

�products of larger share in the total tonnage although with lower percent defectives

�Decrease percentage of products returns because of the defects determined by customers

Objectives

� Decrease the proportion of defective items (to a certaintarget value)

� Identify the most important process parameters affectingquality

� Finding the ranges of these parameters to operate(future direction)

� Optimizing the proportion of defective items (futureconsideration)

Perkins021 Cylinder Head

�Perkins 021 cylinder head is one of the two products chosen for the analysis from the second casting plantthe second casting plant

�Reason:�Having problems with Perkins�Availability of the data�Volume of the data

Cylinder Head

Data Collection�Data in RKN come from several processes

and different time periods.�Weekly �Daily�Hourly�Hourly

�Most of the data come from�Core shop�Molding�Melting

Data Collection (Cont...)

� Lot: total production in a day (one or more shifts)� Daily records consist of the total volume of

production, total count of defective products and the distribution of defect types

� Response variables recorded are: � Response variables recorded are: � total number of defective products�number of defective products for 19 defect types�number of defective products returned by the customer

(newly added)

Data of Core Shop�Cores are produced according to a

weekly production plan �Cores used for a product are ready one

or two days before use�Specific core usage in a shift cannot be �Specific core usage in a shift cannot be

identified accurately�Production may stop for a while and even

the cores from 3 or more days in the past can be put to use arbitrarily

The Data� 5 month’s production data� Number of records : 95 (averages of 95 days)� Input : real (47)� Output : discrete (8)

� Can be transformed to binary, nominal or ordinal variables if needed

� Some missing data� Some missing dataAFTER PREPROCESSING

� 6 real uncorrelated response variables (proportions of defect types) + 1 total response (proportion of defective items)

� 36 real feature (predictor) variables� 92 observations

Problem Settings

x1 x2 y1 y2126,00 135,00 1 0120,00 140,00 1 0110,00 120,00 1 0102,00 131,00 1 0130,00 125,00 1 0285,00 115,00 0 0296,00 140,00 0 0275,00 129,00 0 0

Đ

responses

kfeatures

275,00 129,00 0 0260,00 128,00 0 0280,00 105,00 0 0106,00 306,00 0 1113,00 308,00 0 1122,00 306,00 0 1128,00 329,00 0 1145,00 334,00 0 1287,00 329,00 1 1279,00 324,00 1 1291,00 335,00 1 1260,00 340,00 1 1270,00 321,00 1 1

jobs.

Univariate Modeling

Multivariate Modeling

vs

Univariate Decision Tree Methodology –CART (Continuous data)

∑∈

−=ti

i tyytN

tR 2))(()(

1)(

DECISION TREE MODEL (LEAST) SQUARE DEVIATION

IMPURITY MEASURE

)92/48(006.06%

095.39275.1322

==>>

SupportYTHEN

XANDXIF

)()()(),( RRLL tRptRptRts −−=Φ

A TYPICAL RULE GENERATED

Research Questions� Can we reduce problem dimension by extracting

important features only? � Is there any relation between defect types and process

parameters?� Do the important factors for different defect types

interact? � Are there significant changes in process parameter when

a defect rate is high or low?� Are there significant changes in process parameter when

a defect rate is high or low?� Which process parameter levels are better in reducing

the defects?� Is there any period when high defect rates occur

specifically?� Is there any pattern in the sequence of defect type

occurences?

Feature Reduction

�Feature selection�Feature selection�Decision trees�PCA

Univariate Decision Tree Methodology – Nominal data� Number of records: 748� Analysis Accuracy: 93.45%� inputs: x32, x12, x22, x13, x2, x19, x10, x9, x36, x8, x28� Tree depth: 9

� Results for output field y� Comparing $C-y with y� 'Partition' 1_Training 2_Testi ng� Correct 699 93.45% 294 92.74%� Wrong 49 6.55% 23 7.26%

Total� Total 748 317

� Coincidence Matrix for $C-y (rows show actuals)� 'Partition' = 1_Training 0.000000 1.000000 2.000000� 0.000000 49 0 3 %94.2� 1.000000 0 224 19 %92.1� 2.000000 0 27 426 %94� 'Partition' = 2_Testing 0.000000 1.000000 2.000000� 0.000000 18 0 2

1.000000 0 115 4� 2.000000 0 17 161

Conclusion of the Casting Work

�DT induced rules were instrumental in planning new controlled experiments

�Process optimization may be sought based upon these field experimentsupon these field experiments

�DT induced rules may also be used to set tolerance levels for the uncontrollable features (variables)

Suggested Factor Levels

Factor

contollable?

Adjusted Setting Observed Range

Suggested Trial Range

Pertinent Defect Types Suggested Mean Setting

x2 H [15, 30] [20, 28] [23, 28] (y2),(y3),(y6),(y8) mümkünse [23, 28]

x3 H [15, 30] [30, 40] [31, 37.5] y1,y3 mümkünse [31, 37.5]

x4 E [13, 15] [12.171, 13.678] [12.295, 13.678] y1 sabit [12.295, 13.678]

x5 E [14, 16] [12.27, 13.66] [12.27, 13.165] y8 sabit [12.27, 13.165]

x6 E [7.5, 9.5] [7.585, 8.25] [7.917, 8.25] y8 sabit [7.917, 8.25]

x8 E [35, 42] [21.75, 42] [21.75, 35] y3, (y2) sabit [21.75, 35]

x9 E [3, 3.5] [2.98, 3.387] yok y2, y3, y6, y8 3 seviye [3.183, 3.216], [3.216, 3.26], [3.26, 3.387]

x11 E [18, 23] [19.8, 22.9] [20.339, 22.9] y3 sabit [20.339, 22.9]

June 2007 METU-IE and TU/e-OPAC Workshop

x12 E [250, 400] [290, 360] [350, 360] y2 sabit [350, 360], olmazsa [305, 360]

x14 E [3.5, 5.5] [4.7, 5.2] [4.724, 5.2] y2 sabit [4.724, 5.2]

x16 H [11, 23] [13.2, 30] [15.86, 30] y1, (y2) mümkünse [15.86, 30]

x17 H [11, 23] [15.9, 31.5] [26.55, 31.5] y1 mümkünse [26.55, 31.5]

x19 H [11, 23] [14.1, 24.9] yok y2 kendi seyrine bırakılacak

x20 E 40 [38.992, 42.85] [38.992, 41.32] y3 sabit [38.992, 41.32]

x21 E 50 [48.68, 52.71] [49.181, 52.71] y9 sabit [49.181, 52.71]

x22 E28 marta kadar = 12 31 marttan sonra = 22

28 marta kadar: [10.85, 14,35] 31marttan sonra: [20.05, 33.428] yok y1,y2,y3,y6

4 seviye [10.85, 13.125], [12.275, 14.35], [14.35, 17.2], [17.2, 33.42]

x25 H aralık yok [2.5, 6.9] [2.5, 6.533] y8 mümkünse [2.5, 6.533]

x26 E [1420, 1430] [1367.59, 1428.23] [1367.59, 1425.98] y8, y9 sabit [1367.59, 1425.98]

x27 H aralık yok [2.259, 4.95] [2.259, 4.2] y2, (y3) mümkünse [2.259, 4.2]

x28 H aralık yok [11.7, 16.9] yok y3, y6 kendi seyrine bırakılacak

x29 YES [3.2, 3.35] [3.208, 3.41] NOT AVAILy1,y3,y6, y8

3 levels [3.208, 3.304], [3.304, 3.325], [3.355, 3.41]

x30 E [1.85, 2] [1.823, 2] yok y1,y2,y3 2 seviye [1.823, 1.88], [1.88, 2]

x32 E [0.2, 0.3] [0.171, 0.283] yok y1,y2 2 seviye [0.171, 0.184], [0.184, 0.283]

x33 E maximum 0.3 [0.0767, 0.552] [0.174, 0.552] y2 sabit [0.174, 0.552]

x35 E [0.08, .12] [0.0762, 0.1122] [0.088, 0.1122] y1 sabit [0.088, 0.1122]

DRIVER SEAT DESIGN OPTIMIZATION PROBLEM

� Questionnairre data� 80 observations/subjects� 28-88 input variables (age, sex, distance

travelled, anthropometric measures, ease of use, travelled, anthropometric measures, ease of use, attractives, etc.)

� 1-53 output variables (back comfort, tigh comfort, overall satisfaction, ease of use, attractiveness, etc.)

Rules for customer satisfaction� Rule for 7 / 7 (very satisfied) (support=4; confid ence=1.0)

If Lumbar ache after driving for a long time = 0 and Video gray as a seat cover design = 1 and Accept to pay more for the seat belt sensor = 0 and Adequate support by the seat cushion = 1 then 7,0 (very satisfied)

� Rule for 6 / 7 (satisfied) (support=10; confidence= 1.0)If Lumbar ache after driving for a long time = 0 and Lumbar ache after driving for a long time = 0 and Video gray as a seat cover design = 1 and Accept to pay more for the seat belt sensor = 0 then 6,0 (satisfied)

� Rule for 4 / 7 (normal) (support=8; confidence=0.7 5)If Lumbar ache after driving for a long time = 0 and Easy reach to the lumbar support adjustment =0 then4.0 (normal)

Neural Network Modeling

Neural Network Modeling - General

� A neural network (NN) is an interconnected group of artificial neurons that uses

a mathematical or computational model for information processing based on a

connectionist approach to computation.

� Incorporates learning rather than programming and parallel rather than

sequential processing.sequential processing.

� Neural networks resemble the human brain in two respects:

� The network acquires knowledge from its environment using a learning process

(algorithm)

� Synaptic weights, which are inter-neuron connection strengths, are used to store the

learned information.

General Topology

Output layerInput layer

Hidden layers

Inside the Node

� Components:

� Weights

� Base function (summing unit)

� Activation function

Biasb

� A node

� Receives n-inputs

� Compute net input according to basefunction

� Applies activation function to the netinput

� Outputs result

f(net)Inputvalues

weights

Basefunction

bActivationfunction

net Outputy

x1

x2

Xm

w2

wm

w1

∑... nodei

� Outputs result

�Capabilities� Fault tolerance� Robustness� Non-linear mapping� Learning and generalization� Optimization

�Issues� Number of source nodes

Properties

� Number of source nodes� Number of hidden layers� Number of hidden nodes per hidden layer� Training data (Too much…..overfitting, too little……inaccurate

classification)� Number of classes(sink)� Interconnections� Activation function� Learning technique� Stopping criteria

� Data:

� 36 input variable (continuous)

� 1 output variable (categorical with 3 levels – 1: first defect type exists, 2:second defect type exists, 0: none of these two defect types exist)

� Partition: Training -> 70%, Testing -> 30%

� Learning rule: Back-propagation

Application 1: Classification of quality in Casting

� Network Topology

� Input layer (36 neurons)

� Hidden layer (6 neurons)

� Output layer (1 neuron)

� To prevent overfitting, training set was divided again into training and testing set(partitioning the partition), trained on training set, and error is evaluated on thetest set at each cycle

Results

� Overall predicted accuracy

� Training: 92,56%

� Testing: 87,01%

Training 0 1 2

0 33 0 3

1 0 158 13

2 0 27 344

Testing

0 18 0 0

1 0 51 11

COINCIDENCE MATRIX FOR PREDICTED CATEGORIES

1 0 51 11

2 0 19 132

GAIN CHART

� Data:

� 36 input variable (continuous)

� 1 output variable (percentage of defectives for a certain defect type)

� Partition: Training -> 70%, Testing -> 30%

� Learning rule: Back-propagation

Application 2: Prediction of quality inCasting

Learning rule: Back-propagation

� Method: Exhaustive prune (finds the best topology)

� Final Network Topology

� Input layer (36 neurons)

� First hidden layer (25 neurons)

� Second hidden layer (17 neurons)

� Output layer (1 neuron)

Results

� Estimated accuracy: 99.95%

� Training results are slightly better thantesting results (overfitting)

Statistics

Conclusion

�Neural networks can be used for both classification and prediction

�Unlike decision trees, neural networks are black-box models

�To decide on best production regions, further study may be needed (simulation, DOE, etc).

CLUSTERING

CLUSTERING - General

Clustering of data is a method by which large sets of data is grouped into clusters of smaller sets of similar data.

The example below demonstrates the clustering of balls

we see clustering is grouping data or dividing a large data set into smaller data sets of some similarity.

Clustering Algorithms

A clustering algorithm attempts to find natural groups of components (or data) based on some similarity

Clustering algorithms find k clusters so that the objects of one cluster are similar to each other whereas objects of different clusters are dissimilar.

Taxonomy of Clustering Approaches

Hierarchical vs. Partitional

A hierarchical algorithm partitions the data set in a nestedmanner into clusters which are either disjoint or included one into another. These algorithms are either agglomerative or divisive according to the algorithmic structure and the operation they carried on.structure and the operation they carried on.

A partitional method assumes that the number of clusters to be found is already given and then it looks for the optimal partition based on the objective function.

Nonsmooth Optimization

Most cases of clustering problems are reduced to solving nonsmooth optimization problems.

Nonsmooth Optimization Problem:minimize subject to

: is nonsmooth at many points of interest � does not have a conventional derivative at these points.

A less restrictive class of assumptions for thansmoothness: convexity and Lipschitzness .

Cluster Analysis via Nonsmooth Opt.

� Given instances

� Problem:

� This is a clustering problem with the partitioning method. We will reformulate this as a nonsmooth optimization problem.

Cont’d

� k is the number of clusters (given), � m is the number of instances (given),

� is the j-th cluster’s center (to be found), � association weight of instance , cluster j (to be

found):

Cluster Analysis via Nonsmooth Opt.

found):

� ( ) is an matrix,

� objective function has many local minima.

if k is not given a priori

� Start from a small enough number of clusters k and gradually increase the number of clusters for the analysis until a certain stopping criteria is met.

� This means: If the solution of the corresponding

Cont’dCluster Analysis via Nonsmooth Opt.

� This means: If the solution of the corresponding optimization problem is not satisfactory, the decision maker needs to consider a problem with k + 1 clusters, etc..

� This implies: One needs to solve repeatedly arising optimization problems with different values of k - a task even more challenging.

Reformulated Problem:

Cont’dCluster Analysis via Nonsmooth Opt.

• A complicated objective function: nonsmooth and nonconvex. The number of variables in the reformulated nonsmooth optimization problem above is k×n, before it was (m+n)×k.

• This problem can be solved by related nonsmooth methods (e.g., Semidefinite Programming, discrete gradient method).

Clustering Analysis on RKN Casting Data

We used k-means, PAM (Partitioning Around Medoids) and k-means improved by Nonsmooth Optimization to identifyhomogenous groups in the data.

k-Means: The grouping is done by minimizing the sum of squaresof distances between data and the corresponding cluster centroid.of distances between data and the corresponding cluster centroid.

PAM: A medoid is an object of the cluster, whose average

distance to all the objects in the cluster is minimal.

k-Means improved by Nonsmooth Optimization: k-meansalgorithm that solves a nonsmooth optimization subproblem for calculating the starting point for the k-th cluster center.

Results

� k-Means: k=2, cluster 1: 70 obj., cluster 2: 22 obj.k=3, cluster 1: 68 obj., cluster 2: 22 obj., cluster 3: 2 obj.k=4, cluster 1: 68 obj., cluster 2: 16 obj., cluster 3: 6obj., cluster 4: 2 obj.

� PAM:k=2, cluster 1: 40 obj., cluster 2: 52 obj.k=2, cluster 1: 40 obj., cluster 2: 52 obj.k=3, cluster 1: 33 obj., cluster 2: 34 obj., cluster 3: 25 obj.k=4, cluster 1: 20 obj., cluster 2: 34 obj., cluster 3: 25 obj., cluster 4: 13 obj.

� k-means improved by Nonsmooth Optimization:k=2, cluster 1: 61 obj., cluster 2: 31 obj.k=3, cluster 1: 61 obj., cluster 2: 31 obj., cluster 3: 2 obj.k=4, cluster 1: 45 obj., cluster 2: 24 obj., cluster 3: 2 obj., cluster 4: 21 obj.

ResultsPAM Clusters

Total1 2 3 4

K-MeansClusters

12

200

1222

250

130

7022

Total 20 34 25 13 92

k-means improved by Nonsmooth

Optimization Clusters Total

1 2

k-MeansClusters

12

610

922

7022

Total 61 31 92

In the tables above, we showed the relations betweendifferent clustering results. Optimal partitioning with PAM is obtained for k=4, however for others k=2 gives the best results. For k=3 and k=4 with k-means, the clusters of 2

Results

results. For k=3 and k=4 with k-means, the clusters of 2 and 6 objects are artificial.

These results match with our preprocessing studies (Cathrene Sugar’s “jump method” and PCA) which suggested that k is 2 or 4 in our data.

Jump Method and PCATr

ansf

orm

ed d

isto

rtio

nTr

ansf

orm

ed d

isto

rtio

n

Cluster

Association Rule Mining

Association Analysis� Association rule mining searches for interesting

relationships among the features in a given dataset.

� A typical example of association rule mining is“market basket analysis”.

� This process analyzes customer buying habits by� This process analyzes customer buying habits byfinding associations between the different itemsthat customers place in their “shopping baskets”

Support and Confidence• Association rules are statements in the form of

IF antecedent(s) THEN consequent(s)

where antecedent(s) and consequent(s) are disjoint conjunctions of feature-value pairs.

• Two common measures, support and confidence, are used to evaluate extracted rulesto evaluate extracted rules

• For a rule defined as X=>Y• The support of the rule is the joint probability of X and Y,

Pr(X and Y).

• The confidence of the rule is the conditional probability of Y given X, Pr(Y|X)

PCB Assembly Line

PCB Assembly Line (Cont.)

PCB Manufacturing Data in Transactional Format

� In this format, a single board can be seen in more than one rows, each of which represent different operation performed on this product

� Serial number can be used as the transaction ID which distinguishes different products

� Attributes (variables) of the boards:� Product type� Product type� Description of the failure (failure observed during the final

electrical test)� Root cause (cause of the failure identified during the repair) � Location of the root cause� Board type� Supplier� Operation line failure is detected� Date and time

Attributes

� 11 types of PCB� 38 possible failures (e.g., display error, software

error, no audio, etc.)� 13 possible root causes (e.g., chip without solder,

resistance is upright, short circuit, etc.)resistance is upright, short circuit, etc.)� Location of the root cause on the board� 9 board types� 6 different suppliers

Application: PCB Manufacturing

Board Type serial supplier Failure reason-of-failure Loc ation1 2459 GOODBOARD display error no solder U45 6.PIN 1 736 TATCHUN-GIA TZOONG AUX1 error short circuit U8 2.PIN

� Sample records from PCB manufacturing data

4 990 GIA TZOONG device-not-work sw L71 3 700 TATCHUN-GIA TZOONG display error short circuit R407 6 712 ÜNAL ELEKTRONĐK rgb-cvbs error flash error R412 2 1411 GOODBOARD sw error upright K23 2 663 GOODBOARD-TATCHUN AUX1 error no solder C130 7 627 UNIWELL ELECTRONIC audio error upside-down B353 4 1169 GOODBOARD sw error sw U6

Possible Applications of Association Analysis

� Identifying failure types taken place on the same board together.

�Association of failures with root cause.�Association of failures with suppliers.� Identifying failures occuring in sequence.�Association of failures with the location of

the root cause on the board

Identifying failure types occured on the same board together

�“device-not-functioning” => “flash-not-loading” (%25, %73)

�“flash-not-loading”=> “display error” �“flash-not-loading”=> “display error” (%36, %86)

�“AUX1 error” AND “feed error” => “ audio error” (%32, %61)

Association of failures with root causes

�“upright” AND “Location” = Chip => “audio error” (%46, %82)

�“no solder” => “device-not-functioning” (%18, %100)

Association of failures with suppliers

�“GOODBOARD” => “display error” (%23, %57)

�“UNIWELL” AND “GOODBOARD” => “feed error” (%18, %53)

Identifying failures dependent on the sequence of operations

�Line 1 = “AUX1 error” => Line 5 = “feed error” (% 22, % 48)

Association of failures with the location of the root cause on the board

�“device-not-functioning” => Location = “resistance” (%56, %76)

�“flash-not-loading” => Location = “U8 2.PIN” (%43, %66)

RegressionRegression

Regression Approaches

� MULTIPLE LINEAR REGRESSION (MLR)

� NONLINEAR REGRESSION (NLR)

� GENERAL LINEAR MODELS (GLM)

� GENERALIZED LINEAR MODELS (GLZ)� GENERALIZED LINEAR MODELS (GLZ)

� ADDITIVE MODELS

� GENERALIZED ADDITIVE MODELS (GAM)

� ROBUST REGRESSION

CONCLUSION

� Tough QI problems with several input and output variables can be handled effectively with DM approaches.

� Observational or experimental data, preferentially voluminous data are needed.voluminous data are needed.

� Online data collection systems might need to be installed

� Data quality and pre-processing are crucial� Many tools seem to be difficult to apply in practice for

industry people (advanced training might be necessary)� Results in the form of rules are found useful and

interesting by the industry

FUTURE WORK

� Continue collecting different data sets for different QI problems, and applications on them

� Also apply other DM approaches such as linear / robust regression, fuzzy clustering / regression and rough set theory.

� Compare performances.� Compare performances.� Develop new / improved DM algorithms for solving

the QI problems.� Multi-response decision tree modeling� Non-smooth optimization for categorical quality

responses� Improved MARS with Tikhonov regularization

PAPERS AND PRESENTATIONS FROM THE PROJECT

Bakır, B., Batmaz, Đ., Güntürkün, F.A., Đpekçi, Đ.A., Köksal, G., and Özdemirel, N.E., Defect Cause Modeling with Decision Tree and Regression Analysis, Proceedings of XVII. International Conference on Co mputer and Information Science and Engineering , Cairo, Egypt, December 08-10, 2006, Volume 17, pp. 266-269, ISBN 975-00803-7-8.

Đpekçi, A.Đ., Bakır, B., Batmaz, Đ., Testik, M.C., and Özdemirel, N.E., Defect Cause Modeling with Data Mining: Decision Trees and Neural Networks, to Cause Modeling with Data Mining: Decision Trees and Neural Networks, to appear in Proceedings of 56th Session of the 1st Internationa l Statistical Institute , Lisbon, Potugal, August 22-29, 2007.

Akteke-Öztürk, B. and Weber, G. W., "A Survey and Results on Semidefinite and Nonsmooth Optimization for Minimum Sum of Squared Distances Problem", Technical Report , 2007.

Öztürk-Akteke, B., Weber, G.W., Kayalıgil, S., Kalite Đyileştirmede Veri Kümeleme: Döküm Endüstrisinde Bir Uygulama, Yöneylem Ara ştırması ve Endüstri Mühendisli ği 27. Ulusal Kongresi (YA/EM 2007), Đzmir, Türkiye, Temmuz 02-04, 2007.

PAPERS AND PRESENTATIONS FROM THE PROJECT (cont.d)

Session TC-38: Tutorial Session: Data Mining Applications in Quality Improvement22nd European Conference on Operational Research, Prague, July 7-11, 2007

Köksal, G., Testik, M.C., Güntürkün, F.A., Batmaz, Đ., Data Mining Applications in Quality Improvement: A Tutorial and a Literature Review

Đpekçi, A.Đ., Köksal, G., Karasakal, E., Özdemirel, N.E., Testik, M.C., Multi Response Decision Tree Approach Applied To A Discrete Manufacturing Quality Improvement Problem

PAPERS AND PRESENTATIONS FROM THE PROJECT (cont.d)

Köksal, G., Testik, M.C., Güntürkün, F.A., Batmaz, Đ., Kalite Đyileştirmede Veri Madenciliği Yaklaşımları ve Bir Uygulama, 16th National Quality Congress , November 12, 2007, Đstanbul.

quality control and improvement in manufacturing

Education