multivariate analysis of manufacturing data
TRANSCRIPT
Multivariate Analysis of Manufacturing Data
by
Ronald Cao
Submitted to the Department of Electrical Engineering and ComputerScience
in partial fulfillment of the requirements for the degree of
Master of Engineering and Bachelor of Science in Electrical Engineering
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
June 1997
@ Massachusetts Institute of Technology 1997. All rights reserved.
The author hereby grants to MIT permission to reproduce and distributepublicly paper and electronic copies of this thesis document in whole or in
part, and to grant others the right to do so.
Author .............. .. . ........ "" ......... ..... ..nDepartment of Electrical " ineering and Computer Science
May 23, 1997
Certified by ................. ~... .. -,. . ......David
Professor of Electrical EngineeringS- Thesis Supervisor
1\
Accepted by................. ... ... .........."rederig- R. Morgenthaler
:C ,Department Con~itee onorn ate Students
/
Multivariate Analysis of Manufacturing Databy
Ronald Cao
Submitted to the Department of Electrical Engineering and Computer Scienceon May 23, 1997, in partial fulfillment of the
requirements for the degree ofMaster of Engineering and Bachelor of Science in Electrical Engineering
Abstract
With the advancement of technology, manufacturing systems have become increasingly com-plex. Currently, many continous-time manufacturing-processes are operated by a compli-cated array of computers which monitor thousands of control variables. It has become moredifficult for managers and operators to determine sources of parameter variation and tocontrol and maintain the efficiency of their manufacturing processes.
The goal of this thesis is to present a sequence of multivariate analysis techniques that canbe applied to the analysis of information-rich data sets from web manufacturing processes.The focus is on three main areas: identifying outliers, determining relationships among vari-ables, and grouping variables. The questions asked are 1) how to effectively separate outliersfrom the main population? 2) how to determine correlations among variables or subpro-cesses? and 3) what are the best methods to categorize and group physically significantvariables within a multivariate manufacturing data set?
Results of various experiments focused on the above three areas include 1) both nor-malized Euclidean distance and principal component analysis are effective in separating theoutliers from the main population, 2) correlation analysis of Poisson-distributed defect den-sities shows the difficulties in determining the true correlation between varibles, and 3) bothprincipal component analysis with robust correlation matrix and principal component anal-ysis with frequency-filtered variables are effective in grouping variables. Hopefully theseresults can lead to more comprehensive research in the general area of data analysis ofmanufacturing processes in the future.
Thesis Supervisor: David H. StaelinTitle: Professor of Electrical Engineering
Acknowledgments
It has been an incredibly meaningful and fulfilling five years at MIT. The following are justsome of names of the people who have made tremendous contributions to my intellectualdevelopment and my personal growth.
* My advisor, Professor David Staelin, who provided me with the guidance that I neededon my research and thesis. He has inspired me with insightful ideas and thought-provoking concepts. In addition, he has given me the freedom to explore my ideas aswell as many valuable suggestions for experimentations.
* My lab partners: Junehee Lee, Michael Shwartz, Carlos Caberra, and Bill Blackwell.Many thanks to Felicia Brady.
* Dean Bonnie Walters, Professor George W. Pratt, Professor Kirk Kolenbrander, Pro-fessor Lester Thurow, and Deborah Ullrich.
* All the friends I have made through my college life, especially my good friends DavidSteel and Jake Seid and the brothers of Lambda Chi Alpha Fraternity.
* Most of all, I would like to thank my parents for their endless love and support. Theyhave been there through every phase of my personal and professional development.Thank you!
Contents
1 Introduction 17
1.1 Background ................... . .. .. .. ..... ... .. 17
1.2 Previous W ork .................................. 18
1.3 O bjective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... . . 19
1.4 Thesis Organization ................................ 19
2 Basic Analysis Tools 21
2.1 Data Set ............. ...... .... .... ........ . . 21
2.2 Preprocessing .............. ..... .... .. ........ . . 22
2.2.1 M issing Data ............................... 22
2.2.2 Constant Variables ............................ 22
2.2.3 Norm alization ............................... 22
2.3 Outlier Analysis .................................. 24
2.3.1 Definition . .. ...... ... .. .. .. ... . ... .. . .. .. . 24
2.3.2 Causes of Outliers ............................ 24
2.3.3 Effects of Outliers .............................. .. 25
2.3.4 Outlier Detection ............................. 25
2.4 Correlation Analysis ............................... 26
2.5 Spectral Analysis ................................. 28
2.6 Principal Component Analysis ................... ....... 29
2.6.1 Basic Concept ............................... 29
2.6.2 Geometric Representation ................... ..... 29
2.6.3 Mathematical Definition ................... ...... 31
7
3 Web Process 1 35
3.1 Background .......................... ... ...... .. 35
3.2 Data ...................... ................ .. 35
3.3 Preprocessing .................... ................ 36
3.4 Feature Characterization ............................. 36
3.4.1 In-Line Data . .. ... .. .. .. .. ... .. .. .. .. .. . .. . 36
3.4.2 End-of-line Data ............... ............ .. 37
3.5 Correlation Analysis .................... ........... 39
3.5.1 Streak-Streak Correlation ........................ 39
3.5.2 Streak-Cloud Correlation ......................... 41
3.5.3 Interpretation .................... ....... ..... 42
3.6 Poisson Distribution ............................... 43
3.6.1 M ethod . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . 43
3.6.2 Results............. ................... ..... 44
3.7 Principal Component Analysis ............ .. ....... ..... . 45
3.7.1 PCA of the End-of-Line Data ...................... 45
3.7.2 PCA of the In-Line Data ......................... 45
3.7.3 Interpretation .................. ........... .. 47
4 Web Process 2 49
4.1 Background ........ .... ....... ....... ......... . 49
4.2 D ata . . . . . .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.3 Preprocessing .......... .. . ............. .......... 50
4.4 Feature Characterization ................. ............ 50
4.4.1 Quality Variables ........ ........... ... .. .... .. 50
4.4.2 In-Line Variables ...................... ....... .. ...... 51
4.5 Outliers Analysis ................................. ............ 52
4.5.1 Normalized Euclidean Distance ... . . . ................ 52
4.5.2 Time Series Model - PCA .. ..... .............. .... 53
4.5.3 Identifying Outlying Variables ............. . . . . . . . . . . 58
4.6 Variable Grouping ... ... ............................ 62
4.6.1 Principal Component Analysis ....................... . . . . . 62
4.6.2 PCA with Robust Correlation Matrix . ................. 64
4.6.3 PCA with Frequency-Filtered Variables. ... . .. ....... 72
5 Conclusion and Suggested Work 79
List of Figures
2-1 (a) Plot in Original Axes (b) Plot in Transformed Axes . ........... 30
3-1 Time-Series Behavior of a Typical In-Line Variable . . . . . . . . . . . . . . 37
3-2 Cross-Web Position of Defects Over Time . .................. . 37
3-3 Streak and Cloud Defects ............................ 38
3-4 The 10 Densest Streaks Over Time ....................... 40
3-5 Correlation Coefficients Between Streaks Using a) standard time block, b)
double-length time block, c) quadruple-length time block . .......... 41
3-6 Correlation Coefficients Between Streak and Cloud with Time Blocks of length
1 to length 100 . . .. .. . .. .. .. .. . . . .. .. . . .. . . . . . .. . 42
3-7 (a) Cloud Distribution Using Fixed-Length Time Blocks, (b) Ideal Poisson
Distribution Using the Same Fixed-Length Time Blocks . ........... 43
3-8 Distributions of Cloud Defects Using x2, x4, and x6 Time Blocks ....... 44
3-9 Ideal Poisson Distributions Using x2, x4, and x6 Time Blocks . ....... 44
3-10 First 3 Principal Components of the End-of-Line Data . ........... 46
3-11 Percent of Variance Captured by PCs ...................... 46
3-12 First 4 PCs of the In-Line Data ......................... 47
4-1 Ten Types of In-Line Variable Behavior ................... .. 51
4-2 Normalized Euclidean Distance ................... ...... 53
4-3 Outlier and Normal Behavior Based on Normalized Euclidean Distance . . . 54
4-4 First Ten Principal Components of Web Process 2 Data Set . ........ 55
4-5 High-Pass Filters with Wp=0.1 and Wp = 0.3 . ................ 56
4-6 First Ten Principal Components from 90% High-Pass Filterd Data ...... 57
4-7 First Ten Principal Components from 70% High-Pass Filtered Data .... . 57
4-8 Variables Identified that Contribute to Transient Outliers in Region 1 and 4 60
4-9 The First Principal Component and the Corresponding Eigenvector from Pro-
cess 2 Data ....... .............. .............. . 60
4-10 The First Ten Principal Components from 738 Variables . .......... 61
4-11 The First Ten Eigenvectors from 738 Variables . ................ 61
4-12 First 10 Eigenvectors of 738 Variables ........ . .............. . 63
4-13 Histograms of the First 10 Eigenvectors of 738 Variables . .......... 63
4-14 Magnitude of Correlation Coefficients of 738 Variables in Descending Order
in (a) Normal Scale, (b) Log Scale ........................ 65
4-15 1st 10 Eigenvectors of Calculated from Robust Correlation Matrix with Cutoff
= 0.06 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4-16 Histograms of the first 10 Eigenvectors Calculated from Robust Correlation
M atrix with Cutoff = 0.06 ......... .................... 66
4-17 1st 10 Eigenvectors of Calculated from Robust Correlation Matrix with Cutoff
= 0.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . 67
4-18 Histograms of the first 10 Eigenvectors Calculated from Robust Correlation
M atrix with Cutoff = 0.10 ............................ 67
4-19 First 10 Eigenvectors calculated from Correlation Matrix with Cutoff at 0.15 68
4-20 Histogram of First 10 Eigenvectors calculated from Robust Correlation Matrix
with Cutoff at 0.15 ....... ................. ........ 68
4-21 First 10 Eigenvectors calculated from Robust Correlation Matrix with Cutoff
at 0.18 ... ..................... .... .. ......... .. 69
4-22 Histogram of First 10 Eigenvectors calculated from Robust Correlation Matrix
with Cutoff at 0.18 ........ ...................... ... 69
4-23 A Comparison of Eigenvectors Calculated from (a) Original Correlation Ma-
trix (b) Robust Correlation Matrix with Cutoff=0.06, (c) Robust Correlation
Matrix with Cutoff=0.10, (d) Robust Correlation Matrix with Cutoff=0.15,
(e) Robust Correlation Matrix with Cutoff=0.18 . ............... 71
4-24 A Comparison of Histograms of the Eigenvectors . .............. 71
4-25 (a) High-Pass Filter with Wp = 0.3 (b) Band-Pass Filter with Wp=[0.2, 0.4] 72
4-26 First 10 Eigenvectors Calculated from High-Pass Filtered (Wp=0.1) Variables 73
4-27 Histograms of First 10 Eigenvectors Calculated from High-Pass Filtered (Wp=0.1)
Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4-28 First 10 Eigenvectors Calculated from High-Pass Filtered (Wp=0.3) Variables 74
4-29 Histograms of First 10 Eigenvectors Calculated from High-Pass Filtered (Wp=0.3)
Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4-30 First 10 Eigenvectors Calculated from High-Pass Filtered (Wp=0.4) Variables 75
4-31 Histograms of First 10 Eigenvectors Calculated from High-Pass Filtered (Wp=0.4)
Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... 75
4-32 First 10 Eigenvectors Calculated from Band-Pass Filtered (Wp=[0.2, 0.4])
V ariables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4-33 Histograms of First 10 Eigenvectors Calculated from Band-Pass Filtered (Wp=[0.2,
0.4]) Variables .................... ..... .. .. ...... 76
4-34 First 10 Eigenvectors Calculated from Band-Pass Filtered (Wp=[0.2, 0.3])
Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... . . 77
4-35 Histograms of First 10 Eigenvectors Calculated from Band-Pass Filtered (Wp=[0.2,
0.3]) Variables . . . .. .. ... . . . . . . . . .. . . . . . . . . . . . . .. . 77
Chapter 1
Introduction
1.1 Background
With development of technology, manufacturing systems are getting increasingly more com-
plex. A typical continous-time manufacturing process may be controlled and monitored by
thousands parameters such as temperature and pressure. With higher customer standards
and higher operating cost, manufacturing companies are constantly creating new ways to
deal with the problem of how to increase efficiency and reduce cost.
The Leaders for Manufacturing (LFM) Program is a joint effort among leading U.S. man-
ufacturing firms and both the School of Engineering and the Sloan School of Management
from the Massachusetts Institute of Technology. The goal of LFM is to identify, discover,
and translate into practice the critical factors that underlie world-class manufacturing. MIT
faculty and students and participating LFM companies have identified seven major themes
of cooperation. These research themes are Product and Process Life Cycle, Scheduling
and Logistics Control, Variation Reduction, Design and Operation of Manufacturing Sys-
tems, Integrated Analysis and Development, Next Generation Manufacturing, and Culture,
Learning, and Organizational Change.
The research and analysis presented in this thesis is directly related to Leaders For
Manufacturing Research Group 4 (RG4) whose focus is variation reduction in manufacturing
processes. Understanding variations and methods to reduce them can help companies to
improve yields, reduce defects, decrease product cycle time, and generate higher quality
products. In order to gain this understanding, RG4 attempts to answer questions such
as 1) how to effectively determine which process parameters to monitor and control? 2)
what are useful technique to determine multivariate relationship among control and quality
variables? 3) How to best communicate results and findings to managers and engineers at
the participating companies?
There are many types of manufacturing processes in industry. The type of processes
that this thesis focuses on are referred to as web processes. The particular characteristic
associated with a web process is that the end product is in the form of sheets with the
appropriate thickness, width and length and can be packaged into rolls or sliced into sheets.
Although multivariate analysis methods are applied to two data sets collected from web
processes, most of the tools discussed in this thesis can also be applied to analyze data from
other types of processes.
1.2 Previous Work
My research builds on the work conducted by previous LFM RG4 research assistants. In his
Master's thesis titled "The treatment of Outliers and Missing Data in Multivariate Manu-
facturing Data", Timothy Derksen developed strategies for dealing with outliers and missing
data in largý, multivariate, manufacturing data.[2] He compared the effectiveness of statis-
tics based on standard versus robust estimates of the mean, standard deviation, and the
correlation matrix in separating the outliers from the main population. In addition, he
developed maximum likelihood methods to treat missing data in large multivariate manu-
facturing data. Mark Rawizza's "Time-Series Analysis of Multivariate Manufacturing Data
Sets" [4] discussed various data analysis tools used in engineering and applied them to manu-
facturing data sets. He used fundamental preprocessing and data-reduction techniques such
as principal component analysis to present and reorganize manufacturing data. Furthermore,
he experimented with ARMA models and neural networks to assess the predictability of data
sets collected from both web processes and wafer processes.
1.3 Objective
The objective of this thesis is to apply series of multivariate techniques to analyze information-
rich data sets from continous web manufacturing processes. In particular, a lot of the anal-
ysis is based on having an understanding of the physics behind the manufacturing process.
Combining multivariate analysis tools with an understanding of the underlying physics can
produce results and insights that can be very valuable company managers.
The questions asked in this thesis are: 1) how to effectively separate outliers from the
main population? 2) how to determine relationships among variables and subprocesses?
and 3) what are the best methods to categorize and group variables within an information-
rich multivariate data set? Results of various experiments focused on these three areas are
discussed. Hopefully they can lead to more comprehensive research in the general area of
data analysis of manufacturing processes in the future.
1.4 Thesis Organization
This thesis is divided into four major sections. Chapter 2 presents an overview of the major
multivariate analysis tools and methods used in the rest of the thesis. These tools deal with
preprocessing of the original data, outlier identification and analysis, correlation analysis,
spectral analysis, and principal component analysis.
Chapter 3, the second major section, utilizes the multivariate tools presented in Chapter
2 to analyze a web manufacturing data set. With the data set divided into in-line variables
and quality variables, the objective is to perform multivariate analysis on these two sets
of variables separately and to determine multivariate linear relationships between them. In
addition, correlation analysis is performed on Poisson-distributed defect densities.
Chapter 4, the third section, applies the basic tools to analyze a data set from a dif-
ferent manufacturing web process, where the in-line variables and the quality variables are
not identified. The analysis focuses on experimenting with ways to more effectively sep-
arate variables utilizing principal components. Experimental results show that PCA with
robust correlation coefficients and PCA with frequency-filtered variables are more effective
in grouping and identifying the variables that are correlated with each other.
Chapter 5, the final section, summarizes the important insights gained and suggests
possible areas of continued research.
Chapter 2
Basic Analysis Tools
2.1 Data Set
A data set contains information about a group of variables. The information is the values
of these variables for different times or situations. For cxample, we might have a data set
that consists of weather information for 50 states. There might be 20 variables such as
rainfall, average temperature, and dew point temperature, and 50 observations of these 20
variables representing the 50 states. The data set might also be the same 20 variables and
50 observations representing daily measurements of each of these 20 variables for one state
over 50 days. Both data sets can be represented as a m x n matrix, where m = 50 is the
total number of observations and n = 20 is the total number of variables.
The data sets used in this thesis are recorded from continous-time web manufacturing
systems. A typical data set may consists of measurements of thousand of variables for
thousands of observations recorded over days. The variables can be categorized as either in-
line variables or end-of-line variables. The in-line variables of a manufacturing system control
and monitor the operation of the manufacturing process. Some typical in-line variables are
temperature, pressure, volume, speed, and so on. Furthermore, end-of-line variables, also
referred to as quality variables, provide managers and technicians with information on the
quality of the end product of the manufacturing process. Some typical quality variables are
defect size, defect location, thickness and strength.
2.2 Preprocessing
Preprocessing the data is an integral part of data analysis. Very rarely can large new data
sets be used unaltered for multivariate analysis. The following are three major parts of
preprocessing:
2.2.1 Missing Data
Within a raw manufacturing data set, very rarely are all the observations complete, espe-
cially when measurements are collected over days. Often, parts of machines or subprocesses
are shut down for maintenance or testing purposes. As a result, certain parameters are not or
cannot be recorded. These missing observations need to be treated before any multivariate
analysis. Timothy Derksen, in "The treatment of Outliers and Missing Data in Multivariate
Manufacturing Data", investigated methods of detecting, characterizing, and treating miss-
ing data in large multivariate manufacturing data sets.[2] In general, if a variable has most
of its observations missing, the variable should be removed completely from the data set.
Otherwise, the missing observations can be estimated using the EM algorithm described by
Little.[6]
2.2.2 Constant Variables
Multivariate analysis allows for understanding variable behavior in a multi-dimensional
world. Any variables that are constant over time do not exhibit any information relevant to
multivariate analysis. As a result, variables that have zero variance should be removed from
the data set.
2.2.3 Normalization
For a web process data set that contains n variables and m observations, the n variables
can consist of both control and quality parameters such as temperature, pressure, speed,
thickness, density, and volume. Since all these variables are most likely measured in different
units, it is often very difficult to compare their relative values. To deal with this comparison
problem, normalization is applied to the variables.
For a given mxn matrix with i = 1, 2, . .. , m observations and j = 1, 2, . .. , n variables,
where the value of any ith observation and jth variable is denoted as Xej, the corresponding
value in the normalized data set is denoted as Zij.
Normalization is commonly defined as the following:
Zi = Xi - 9 (2.1)'Tj
where
Xj = '=1= xij (2.2)m
m - 1 (2.3)
In words, to calculate the normalized Zaj for any ith observation and jth variable, we take
the corresponding value Xij and subtract the mean Xj of the jth variable, and divide the
result by the standard deviation aj of the jth variable. In the rest of this thesis, a variable
that is said to be normalized is normalized to zero mean (Zi = 0) and unit variance (oa = 0).
There are benefits and drawbacks with performing normalization before multivariate
analysis. The following are reasons for normalization:
* Normalization causes the variables to be unit-less. For example, if the unit of Xij is
meters, Xij - Xj is also in meters. When the result is divided by aj, also measured
in meters, the final value Zii will be unit-less. As a result of normalization, variables
originally measured in different units can be compared with each other.
* Normalization causes all the variables to be weighted equally. Since the normalized
variables are zero-mean and unit-variance, each variable is weighted equally in de-
termining correlations among variables. Normalization is especially important before
performing multivariate analyses such as principal component analysis, because it gives
each variable equal importance. More on normalization and principal component anal-
ysis will be discussed in Section 2.6.
* Normalization is a way of protecting proprietary information inherent in the original
data. By taking away the mean and reshaping the variance, the information that is
proprietary can be removed. Protecting proprietary information is a very important
part of LFM's contract with its participating companies.
The following is one of the drawbacks of normalization:
* Normalization may increase the noise level. Since normalizing causes all the variables
to have unit variance, it is likely that some measured noise will be scaled so that it rivals
the more significant variables. As a result, normalization may distort the information
in the original data set by increasing the noise level.
2.3 Outlier Analysis
The detection and treatment of outliers are an important pre-step to performing statistical
analysis. This section defines outliers, names the causes and effects of outliers, and presents
some univariate and multivariate tools of detecting outliers.
2.3.1 Definition
Outliers are defined as a set of observations that are inconsistent with the rest of the data.
It is very important to understand that outliers are defined relative to the main population.
2.3.2 Causes of Outliers
The following are the causes of outliers:
1. Extreme members - Since manufacturing data consist of variables recorded over thou-
sands of observations, it is possible that some observations can occasionally exhibit
extreme values.
2. Contaminants - These are observations that should not be grouped with the main
population. For example, if the main population is a set of observations consisting the
weight of apples, the weight of an orange is considered a contaminant if it is placed
in the same group. In a manufacturing process, a contaminant can be an observation
made while a machine is broken amidst observations made while the machine is properly
operating.
2.3.3 Effects of Outliers
Statistical analysis without the removal of outliers can produce skewed and misleading re-
sults. Outliers can potentially drastically alter the sample mean and variance of a popu-
lation. In addition, outliers, especially contaminants, can incorrectly signal the occurrence
of extreme excursions in a manufacturing process when the process is actually operating
normally.
2.3.4 Outlier Detection
For a data set with n variables and m observations, a potential outlier is a point that
lies outside of the main cluster formed by the general population. The following are some
methods used to determine outliers.
Univariate Method An univariate method of detecting outliers is the calculated the
univariate number of standard deviations from the mean.
zij = (2.4)
where xij is the value of observation i and variable j, Tj is the sample mean of the variable
j, and aj is the sample standard deviation of the variable j. Observations where zij > Kj,
where Kj is a constant for variable j, can be categorized as outliers. Depending on the range
of the values of observations for each variable, the value of Kj can be adjusted. To determine
gross outliers, the value of Kj can be set to be large.
Multivariate Methods Equation 2.4 can be extended so that it represents a multivariate
measure of the distance of all the variables away from the origin. Observations where zj >
K, where K is a constant, can be treated as points lying outside of a n-dimensional cube
centered on the sample mean. This multivariate method is very similar to the univariate one,
except the value of K is constant for all variables. Similarly, this method can be effective in
determining gross outliers, but in a manufacturing environment where most of the variables
are correlated, this method is limited in its effectiveness in identifying outliers.
A more robust multivariate method to detect outliers involves calculating the Euclidean
distance to the origin of the n-dimensional space after all n variables are normalized to zero-
mean and unit-variance. The square of the normalized Euclidean distance is defined as the
following:N 2
d 2 = -- (2.5)j=1 si
where xij is the value of observation i for variable j, and sj is the sample variance of variable
j. Observations with d? > K, where K is a constant, lie outside of an ellipsoid centered
around the origin and are considered as outliers.
2.4 Correlation Analysis
In a multivariate manufacturing environment, it is often desirable to measure the linear
relationship between pairs of variables or among groups of variables. By understanding these
relationships among variables, managers can gain insights into the manufacturing process.
One method of determining the linear relationship between variables is to calculate their
covariance and correlation.
Given two variables i and j, with m observations, the sample covariance sij measures the
linear relationship between the two variables and is defined as the following:
sij = rs - a (ki -1i)(eke s e v) (2.6)k=l
For n variables, the sample covariance matrix S = (sij) is the matrix of sample variances
and covariances of combinations of the n variables:
S = (si) =
S11 312 ... Sin
821 S22 ... S2n
Sn1 Sn2 *.. Snn
where the diagonal of S represents the sample variances of the n variables, and rest of the
matrix represents all the possible sample covariances of pairs of Variables. The covariance of
the ith and jth variables, sij, is defined by Equation 2.6, and the variance, sij = s?, of the
same pair of variables is defined as the following:
i " m= - 1 (ki - -i) 2 (2.8)k=l
Since the covariance depends on the scale of measurement of variable i and j, it is difficult
to compare covariances between different pairs of variables. For example, if we change the
unit of a variable from meters to miles, that covariance will also change. To solve this
problem, we can normalize the covariance by dividing by the standard deviations of the two
variables. The normalized covariance is called a correlation.
The sample correlation matrix R can be obtained from the sample covariance matrix and
is defined as:
1 r12 ... rln
Rr21 1 ... r2n
rnl rn2 ... 1
(2.9)
where rij, the sample correlation coefficient of the ith and jth variable, is defined as the
following:
rn = (2.10)
Since the correlation of a variable with itself is equal to 1, the diagonal elements of matrix R
in Equation 2.9 are all is. In addition, please notice that if the variables are all normalized
to unit variance such that s;i =1 and sjj = 1, then the correlation matrix R is equal to the
(2.7)
covariance matrix S. Since most of the multivariate analysis discussed in this paper deals
with normalized variables, R is often substituted for S.
2.5 Spectral Analysis
Fourier Transform Fourier transforms can be an excellent tool to gain insight into the
behavior of variables in the frequency domain. For the jth variable observed at time i =
1, ..., m, the Fourier transform is defined as:
m
Xj(ew) = xiie - jwn (2.11)i=1
Autocorrelation Function The autocorrelation function looks at a variable's correlation
with itself over time. A typical random signal is more correlated with itself over short time
lag versus long time lag. The autocorrelation of variable xj is:
R j(r) = E [xizx(i-,,)] (2.12)
Power-Spectral Density Power-Spectral Density (PSD) is the Fourier transform of the
autocorrelation function of a random signal xj(t).
Payr(w) = F(Rxx,(r)) (2.13)
where F is the Fourier transform operator and Rxj, (7) is the autocorrelation of the random
signal xj(t). For simplicity, the ensemble average of xj(t) is assumed to be zero without
any loss of generality. The calculation of autocorrelation requires the ensemble average of
xj(t)xz(t - r). Since our data consist of one sample sequence for each variable, this ensemble
average is impossible to get. One technique to get around this problem is to assume the
sequence is ergodic. Then the PSD is the magnitude squared of the Fourier transform.
P=xj(w) = IF(xj(t))12 (2.14)
2.6 Principal Component Analysis
2.6.1 Basic Concept
Principal components analysis (PCA) is a mathematical method for expressing a data set in
an alternative way. The method involves using linear combinations of the original variable
to transform the data set onto a set of orthogonal axes. The main objective of principal
component analysis is two-fold: 1) data reduction, and 2) interpretation.
Principal components analysis is often referred to as data reduction rather than data
restatement, because it preserves the information contained in the original data in a quite
succinct way. Principal component analysis takes advantage of the relationship among the
variables to reduce the size of the data while maintaining most of the variance in the original
set. A data set with n variables and m observation can be reduced to a data set with k
principal components and m observations, where k < n. In addition, since PCA transforms
the original data into a new set of axes, it often reveal relationships that are buried in the
original data set. As a result, PCA is a powerful tool in multivariate analysis.
2.6.2 Geometric Representation
Principal components analysis can be best understood in terms of geometric representation.
We can start with a simple two-dimensional example. Table 2.1 shows 12 observations of 2
variables, X1 and X 2.
1 2 3 4 5 6 7 8 9 10 11 12
X1 8 4 5 3 1 2 0 -1 -3 -4 -5 -8X2 4 6 2 -2 3 -2 0 2 -2 -6 -1 -3
Table 2.1: 12 Observations of 2 Variables
Figure 2-1 represents the data in Table 2-1 using two different sets of axes. The points
in Figure 2-la are interpreted relative to the original set of axes, while the same points
-8 -6 -4 -2 0 2 4 6 8
(a) (b)
Figure 2-1: (a) Plot in Original Axes (b) Plot in Transformed Axes
are interpreted relative to a new set of orthogonal axes in Figure 2-lb. The information is
preserved as the axes are rotated.
Similar to Figure 2-1b, principal components are defined as a transformed set of coordi-
nate axes obtained from the original data used to describe the information content of the
data set. In a 2-dimensional data set. The first principal component is defined as the new
axis that captures most of the variability of the original data set. The second principal com-
ponent, perpendicular to the first one, is the axis that captures the second biggest variance.
The principal components are calculated in a minimal squared-distance sense. The distance
is defined as the perpendicular distance from the points to the candidate axis. The first
principal component is the axis where the sum of the squared-distance from the data points
to the axis is minimal among all possible candidates. The second principal component is
taken perpendicular to the first one, for which the sum of the squared distance is the second
smallest.
In a multivariate data set that extends over more 2 dimensions, PCA finds directions
in which multi-variable data contain big variances (therefore, much information). The first
principal component has the direction in which the data have the biggest variance. The
direction of the second principal component is that with the biggest variance among the
directions which are orthogonal to the direction of the first principal component, and so on.
After a few principal components, the remaining variance of the rest is typically small enough
I ' ' ' ' m · · · · I
I . . . . · I . . . . I
so that we can ignore them without losing much information. As a result, the original data
set with n dimensions (n variables) can be reduced to a new data set with k dimensions (k
principal components) where k < n.
2.6.3 Mathematical Definition
Principal component analysis takes advantage of the correlation among variables to find new
set of variables which reduce most of the variation within the data set to as few dimension
as possible. The following is the mathematical definition [3]:
Given a data set with n variables and m observations, the first principal components
must satisfy the following conditions:
1. z1 is linear function of the original variables.
zi = w11X 1 + W12X 2 + ... + wlnXn (2.15)
where w11, w12 , . . . , w• are constants defining the linear function.
2. Scaling of new variable z l .
w1 + 2 + ... + n =1 (2.16)
3. Of all the linear functions of the original variable that satisfy the above two conditions,
pick zl that has the maximum variance.
Consequently, the second principal component must satisfy the following conditions:
1. z 2 is linear function of the original variables.
Z2 = w 2 1X 1 + w 2 2X 2 +-...- + 2 n Xn (2.17)
where w21, w22, • • , w2n are constants defining the linear function.
2. Scaling of variable z 2.
2w + w2 + " + w2 = 1 (2.18)
3. zl and z 2 must be perpendicular.
Wllw21 + W12 22 + ...- + WlnW2n = 0 (2.19)
4. The values of z 2 must be uncorrelated with the values of zj.
5. Of all the linear functions of the original variable that satisfy the above three conditions,
pick z 2 that captures as possible of the remaining variance.
For a data set with n variables, there are a total of n possible principal components. Each
component is a linear combination of the original set of variables, is perpendicular to the
previously selected components, with values uncorrelated with the values from the previous
set of values, and which explains as much as possible of the remaining variance in the data.
In summary,
zl = W'1X = w11X1 + wl 2X 2 + ... + wi,X,
z2 = w'2X = w21XI + w22X 2 + ... + w2,Xn (2.20)
z, = w',X = w 1lX 1 + wn2X 2 + ... + WnX,
where random variable X' = [X1, X 2, X3 ,..., Xn] has a covariance matrix S with eigenvalue-
eigenvector pairs (A1, ex), (A2, e2), . • , (An, e,) where A1 > X 2 > ... > A, > 0. The
principal components are uncorrelated linear combinations zi, z2 , z3, . . . , zn, whose
variances, Var(zi) = wýSwi, are maximized.
It can be shown that principal components solely depend on the covariance matrix S of
X 1, X 2 , X 3 , . . ., Xn. This is an very important concept to understand. As described earlier,
the axes of the original data set can be rotated by multiplying each Xi by an orthogonal
matrix W:
zi = WXi (2.21)
Since W is orthogonal, W'W = I, and the distance to the origin X is unchanged:
z:zi = (WX,)'(WXi) = X W'WX, = X X, (2.22)
Thus an orthogonal matrix transformed Xi to a point zi that is the same distance from the
origin with the axes rotated.
Since the new variables zi, z2, z3, * . • , z, in z = WX are uncorrelated. Thus, the
sample covariance matrix of z must of the form:
Sz
s2z 1 0 ... 0
0 Sz 2 ... 0
0 0 ... s2,n
(2.23)
if z = WX, then S, = WSW', and thus:
s2 Z 0 ... 0
0 S2 z2 ... 0WSW' = (2.24)
0 0 ... S2zn
where S is the sample covariance matrix of X. In linear algebra, we know that given C'SC
= D, where C is an orthogonal matrix, S is a symmetric matrix, and D is a diagonal matrix,
the columns of the matrix C must be normalized eigenvectors of S. Since Equation 2.24
shows that orthogonal matrix W diagonalizes S, W must equal the transpose of the matrix
C whose columns are normalized eigenvectors of S. W can be written as the following:I
W
W'.
(2.25)
where w! is the ith normalized eigenvector of S. The principal components are trans-
formed variables zl = w'X, z2 = w'X,. . . , z = w'X in z = WX. For example, zl =
wllX 1 + W12X 2 + ... + WlnXn
In addition, the diagonal elements in Equation 2.24 are the eigenvalues of S. Thus the
W =
eigenvalues A1, A2 , . . . , An, of S are the variances of the principal components zi =W!X:
s2zi = Ai (2.26)
Since the eigenvalues of S are variances of the principal components, the percentage of the
total variance captured by the first k principal components can be represented as:
A1 + ..2 + Ak% of Variance Captured = (2.27)
2i=I Sii
The following is a summary of some interesting and useful properties of principal com-
ponents.(Johnson, Wichern, p. 342)
* Principal components are uncorrelated.
* Principal components have variances equal to the eigenvalues of the covariance matrix
S of the original data.
* The rows of the orthogonal matrix W correspond to the eigenvectors of S.
f)A
Chapter 3
Web Process 1
3.1 Background
The data set used in this chapter is collected from a continous web manufacturing process
where more than 850 in-line control parameters are constantly monitored. The end-of-
line data comes from an optical scanner sensitive to small light-scattering defects where 8
important quality parameters are measured with high precision. In this chapter, some of the
analysis tools described in Chapter 2 are ultilized to characterize the multivariate behavior
of the in-line data, the multi-variate behavior of the end-of-line data, and the statistical
relationship between the two.
3.2 Data
The data set from Web Process 1 consists of two major groups of variables: in-line variables
and end-of-line variables. The in-line data set consists of physical parameters that control
the production process, while the end-of-line data are parameters that indicate the quality
of the end product. The combined data set represents information for the manufacturing of
115 rolls of the end product.
The in-line data set contains 854 control parameters, measured approximately every 61
seconds for 4320 observations. The end-of-line data consist of 4836 measurements of 8 quality
parameters. The values of these quality parameters are collected by a real-time scanner that
sweeps across the web at constant frequency. One of the 8 quality variables is an indicator
of the type of defect that occurs at the end-of-line.
3.3 Preprocessing
As discussed in Chapter 2, raw data sets often need to be preprocessed before any multi-
variate analysis are performed. In the following two sections, techniques are applied to both
the in-line and end-of-line data in order to present more effectively the information in the
original data.
End-of-Line Data Sometimes the values of variables are not numeric. As a result, the
information contained in the variables must be encoded before any analysis. How to encode
these non-numeric values depends on the type of information they convey. For example, the
end-of-line data contains a variable that categorizes the different types of defects. There are
a total of 8 defect types, and each one of them is simply assigned a numeric value.
In-Line Data There are a total of 854 in-line parameters, measured approximately every
61 seconds for a period of 3 days. Of all these parameters, 194 variables are constant over
the entire period. These 194 variables can be discarded without any further investigation.
In addition, 222 variables are also eliminated, because they are simply averages of other
parameters. Consequently, 438 in-line parameters are left for analysis.
3.4 Feature Characterization
Before performing any multivariate analysis, much insight can be obtained from examining
the data in the time and space domain.
3.4.1 In-Line Data
The in-line variables show fluctuations over time. This means the physical process does not
remain steady. It tends to change "significantly" over time. The following is a plot of the
behavior of a typical in-line parameter over time.
Figure 3-1: Time-Series Behavior of a Typical In-Line Variable
3.4.2 End-of-line Data
The end-of-line data, include the sizes, shapes, positions and times of defects. Figure 3-2 is a
visual representation of the positions and times of the defects. The horizontal axis represents
the cross-web position, and the vertical axis represents the times when defects occur. Each
point on the graph represents a defect spot at a particular time and at a particular position
across the web. One can simply imagine Figure 3-2 as one big sheet of the end-product where
the defects location are marked by dots. If the web moves at a fairly constant speed, the
variable, time, on the vertical axis is highly correlated with down-web position of defects.
Position of Defects on Web
10 20 30WKdh
40 50
Figure 3-2: Cross-Web Position of Defects Over Time
Figure 3-2 shows a number of interesting features:
n"~
I,.
· · i' · ··- I
JI- ...1 ·;*i7*'.rI~J4 EIIi ::z:t.:. :.:5' . .
U~ I:,.
---
* Defects can be categorized into streaks and cloud. Some defects tend to occur at
the same cross-web position over time (streaks), while others appear to occur fairly
randomly on the web (cloud).
* Defects are significantly denser on the left side of the web than the right side.
* For certain periods of time, there are no defect occurrences across the web. The
could represent 'perfect' times when the manufacturing process is running without any
defects.
Perfect Observations Figure 3-2 shows that there are certain periods of time where
no single defect occurs across the web. There are two possible scenarios that can explain
these 'perfect' observations: 1) At these observations, the manufacturing process is running
perfectly and all the control parameters are at their optimal levels. Consequently, there are
no defects. 2) These 'perfect' observations are simply the result of the process being shut
down and the scanner recording no information.
After some investigation, it was discovered that the manufacturing process is occasionally
shut down for various maintenance reasons, such as the cleaning of rollers, etc. Since the
scanner continues to operate during these periods, no defects are recorded on the web. As
a result, it was determined that the 'perfect' observations are simply contaminants that do
not have any physical significance.
01lou ed
:010
*r H3500. 12000. j
0 10 20 0 40 so 00
Figure 3-3: Streak and Cloud Defects
i·).· z·' ' ~
~':·; · ~':.·· '''·' ~·:·'
·i-5 -:z~~ . L:. · · ·, ~c':' ' -~··P
·f··L · · _l;~_ri·. ~·I·~ · i'' ~':.·~:' ~. · · ·.:
`rr- · I· "r: .. · · r·4·i -··
.· 'r =' ··:·· · ·· r. · · ·C ·,
4=
35M
awo
"000
1S
iV --0 2o 30 a 40 aSekec
Oft-k 0.1"
Cloud and Streak Defects Figure 3-3 shows the end-of-line defects can be separated
into streak defects and cloud defects. The threshold for differentiating between streak and
cloud defects is 10 defects counts per unit distance across the web, where this distance is
0.1 percent of the web width. In other words, if there are more than 10 defect counts for
all times that occur within a certain unit distance block across the web, then these defects
within this block are counted as a defect streak. Defects within any blocks totaling less than
10 are categorized as cloud defects.
3.5 Correlation Analysis
In order to improve quality and reduce defect rate, an interesting question that a plant
manager might ask is "are the occurrences of streak and cloud defects related to some
common physical factors or they caused by separate physical phenomenon?"
Figure 3-3 shows that these two types of defects can be clearly separated from each other
and seem to resemble two separate physical processes. The cloud defects seem to be fairly
randomly distributed, while the streak defects are concentrated on the left side of the web and
seem somewhat correlated. Correlation analysis is a good method to apply here to determine
the relationships between streaks and between streak and cloud defects. Understanding the
correlations among streaks and clouds is a good beginning to understanding the underlying
physics that causes the defects.
3.5.1 Streak-Streak Correlation
Figure 3-3 indicates that most streaks occur near the left edge of the web. This suggests
that streak defects are not randomly generated. Some physical characteristics particular
on the left side of the web could be causing the streak defects. To test this hypothesis,
the correlation coefficients between all 45 combinational pairs of the 10 densest streaks are
calculated. If the streaks are caused by some common factor, the correlation coefficients
between streak would be close to 1 or -1. Conversely, if the streaks are not caused by some
common factor, the correlation coefficient should be closer to 0.
as=o
2000
som
Mejor 3tkr Defeaf
;I'4.!- I.
- 11
ji'9
5. I
'Ii
f Ii
: 1 1 .
I ' a
i. 'I
.2 3.s 4 4.5
Figure 3-4: The 10 Densest Streaks Over Time
Method Figure 3-4 shows the ten densest streaks on the web, which are used for correlation
analysis. In order to calculate the correlation coefficients between streaks, each streak is
divided into approximately 257 time blocks. Consequently, a single streak can be represented
as a 257-element vector, where each element is the total defect count within each time block.
The correlation coefficient between any two 257-element vectors can be calculated using
Equation 2.10. Furthermore, each streak can also be divided into time-blocks of other length.
The same procedure can be applied to calculate the correlation coefficients of the streaks
using different length time-blocks.
Results Figure 3-5a shows the correlation coefficients for all 45 combinations of the 10
defect streaks using the standard-length time blocks. Although most of the streaks are
positively correlated, the average correlation coefficient is only 0.08652.
Figure 3-5b and Figure 3-5c show the correlation coefficients of the 10 defect streaks using
double-length and quadruple-length time blocks respectively. There is a small but steady
increase in the correlation coefficients as the length of the time blocks are increased. For
double-length time blocks the average correlation coefficient is 0.1054, and for quadruple-
length time blocks, the average correlation coefficient is 0.122.
The increase in correlation coefficients with increasing time blocks indicates that while
the streaks are somehow correlated, most of the correlations are buried in high frequency
noise. As the time block lengthens, some of the high frequency noise are filtered out, resulting
400-0 · ·
~I';I
- -- --
Streak Correlations
01mean - 0.086552
-0.5 m 0O 5 10 16 20 25 30 35 40Streak Correlations
0.5-
mean 0.105224
-0.5 -
0 5 10 15 20 25 30 SS 40
Figure 3-5: Correlation Coefficients Between Streaks Using a) standard time block, b)double-length time block, c) quadruple-length time block
in higher correlation coefficients. But due to the uncertainty of the signal-to-noise ratio of
the process, the 'true' correlation coefficients between the streaks are still uncertain.
3.5.2 Streak-Cloud Correlation
This part of the analysis focuses on determining correlations between the cloud defects and
the streak defects. If the two types of defects are highly correlated, there should exist some
common process parameters that cause the occurrence of both streak and cloud defects. If
they are not highly correlated, the two types of defects are most likely caused by separate
process parameters. In addition, comparing the correlation coefficients calculated using
different time-blocks can present some information as to the frequency range where the two
types of defects are most correlated or uncorrelated.
Results By using time blocks varying from length 1 to length 100, the correlation coeffi-
cients are calculated between the streak and cloud defects. Figure 3-6 shows that the streaks
and cloud defects are positively correlated. For time blocks on the order of length 1 to length
20, the correlation coefficient is approximately 0.2. As the length of time blocks increase
to the order of 70 to 100, the correlation coefficient gradually increases to an average of
I.T TTTTTT TT _T TTT.T TT TT ·
Correlaton Coelficients Between Cloud and Streak Detects
Figure 3-6: Correlation Coefficients Between Streak and Cloud with Time Blocks of length1 to length 100
approximately 0.35.
The positive correlations between the streak and cloud defects indicate that they are
related some common underlying physics. In addition, the analysis shows that the correlation
is higher when longer time blocks are used. This suggests that some of the high-frequency
noise is filtered out as the time block increases, resulting in a more accurate representation
of the correlation coefficients between the streaks and clouds.
3.5.3 Interpretation
The defect data show the difficulties in determining the correlation coefficients between two
processes when the true signal-to-noise ratio is unknown. For example, in section 3.5.1, it
is shown that the correlation coefficients between the streaks increase as the time blocks
are lengthened. Lengthening the time blocks, in effect, removed some of the high frequency
Poisson noise component, resulting in a more accurate representation of the correlation
coefficients. More analysis needs to be done to quantify the effect of Poisson noise on the
correlation coefficient so that the 'true' correlation coefficients between streaks and between
streaks and clouds can be identified.
3.6 Poisson Distribution
Figure 3-3a shows that cloud defects seem to be randomly generated and fairly evenly dis-
tributed across the web, representing a Poisson distribution. In this section, analyses are
performed to look at the distribution of cloud defects over time. The key is to find out
whether or not the cloud defects exhibit a Poisson distribution, and if they do, over what
frequency range.
3.6.1 Method
Similar to the method utilized for determining correlations between streak and cloud defects,
the entire set of cloud defects is divided into time blocks of a certain length. Once the total
cloud defect count is determined for each time block, a histogram of the total defect count in
each time block is presented and compared to a plot of a typical Poisson distribution. Time
blocks with different length can be used to determined if there are certain frequency ranges
where the cloud defects resemble Poisson distributions.
8
200
E= 100"
Cloud Distribution using 5 min Time Blocks
0
5 10Number of Defects in a Time Block
Poisson Distribution using 5 min Time Blocks
5 10Number of Defects in a Time Block
Figure 3-7: (a) Cloud Distribution Using Fixed-Length Time Blocks, (b) Ideal Poisson Dis-tribution Using the Same Fixed-Length Time Blocks
std=19.Z2
17.52td=
std=14.18Sstd=9.381
_·Y·|
''' 0 -- -~----- ----- I
I
3.6.2 Results
A time block of a certain fixed length is selected to determine the histogram of the defect
density. For the selected standard length, the average number of cloud defects in each stan-
dard time block is approximately 2. Figure 3-7a shows the histogram of these cloud defects
using these fixed-length time blocks, and Figure 3-7b shows an ideal Poisson distribution
generated with the same average defect density using the same standard-length time blocks.
A comparison of Figure 3-7a and 3-7b show that the cloud defect distribution does not
resemble a Poisson distribution for these standard-length time blocks.
Cloud Mitibuton u..hg 10 mlN Tn Mioc. k
I
40
lo -0-
0 2 4 0 8e 10 12 14 16 18 20NumbFro DefDects in a Ti6 m Block
Cloud Distribution us"lg 30 rnn Trne Block2HD
0 2 4 6 8 10 12 14 16 1U 20Nu.bLr ofDalect , . TimBo.
Figure 3-8: Distributions of Cloud Defects Using x2, x4, and x6 Time Blocks
I
Pot n DCidbon urno 30 r-n TW- Block.
Nu "Ir of Dil ctflF in r Tir- Block
Figure 3-9: Ideal Poisson Distributions Using x2, x4, and x6 Time Blocks
Figure 3-8 presents the histograms of cloud defect density using 2 times, 4 times and
6 times the length of the standard time blocks used in Figure 3-7. Figure 3-9 shows the
ideal Poisson distributions with the same average defect density as the cloud distributions
generated using time blocks x2, x4, and x6 the standard length in Figure 3-7. A comparison
of Figure 3-8 to Figure 3-9 shows that the cloud defect density does not exhibit a Pois-
son distribution when measured using small time blocks. But as the length of the time
block is increased, the distribution of the cloud defects becomes similar to that of a Poisson
distribution.
3.7 Principal Component Analysis
As discussed in Section 2.6, principal component analysis (PCA), also referred to as the
Karhunen-Loeve transformation (KLT), is a powerful tool in multi-variable data analysis. In
a multi-variable data space, the number of variables that have to be observed simultaneously
can be enormous. As a result, PCA is applied to reduce the number of variables without
losing much information and to interpret the data using a different set of axes.
3.7.1 PCA of the End-of-Line Data
Principal component anlaysis is applied to the 7 of the 8 end-of-line quality variables, ex-
cluding the variable that characterizes the defect type. Figure 3-10 shows the time-series
behavior of the first 3 principal components. Figure 3-11 shows the accumulated variance
captured as a function of the number of principal components used. One can see that ap-
proximately 90% of the information contained in the 7 in-line variables are captured by the
first 3 principal components.
3.7.2 PCA of the In-Line Data
Principal component analysis is applied to 438 in-line variables. Figure 3-12 displays the
first 4 principal components of the in-line data. Changes in the principal components imply
that the production process fluctuates over time.
Figureoo 1003-10 Firsoo 2ooo 3 Principal Components of the End-oo sof-Line Data
Figure 3-10: First 3 Principal Components of the End-of-Line Data
I
j
Figure 3-11: Percent of Variance Captured by PCs
105d i
-5-
0 500 1000 1 500 2000 25M0 3000 3=0 4000 45'00 5000Tie
o o 1 ooo 5oo 2000 200 3000 3500 4000 4500 000Tk-
10-CLi0
-10.
-2015/15pm 524pm 6/4noon 5A8Smm
20
0-I
5I10aM 5/2 dam 5/3 2amn 54 10Opm
5/1 5pm 24pm 5/3 noon 5/4 lam
Figure 3-12: First 4 PCs of the In-Line Data
3.7.3 Interpretation
A comparison of the two sets of principal components presented in Figure 3-10 and Fig-
ure 3-12 can reveal some interesting insight into the nature of the relationship between the
in-line and the end-of-line data. Since the principal components are another way of repre-
senting the process variables, fluctuations in the principal components indicate fluctuations
in the underlying process. As indicated before, the principal components of the in-line data
fluctuate noticeably in time. Assuming there exists a close relationship between the in-line
data and the end-of-line data, the principal components of the end-of-line data should also
show similar fluctuations. However, Figure 3-10 does not confirm this. Instead, the principal
components of the end-of-line data seem to behave completely independently of the princi-
pal components of the in-line data. As a result, PCA shows that there is no strong linear
relationship between the in-line and the end-of-line data from web process 1.
Chapter 4
Web Process 2
4.1 Background
Web process 2 is a multi-staged manufacturing system that takes raw materials and trans-
forms them into the final product through a number of sequential subprocesses. Raw materi-
als are introduced into the first stage, and after going through certain chemical and physical
changes, the desired output is produced. Next, the output from the first stage becomes the
input for the second stage. Again, under certain control parameters, input is transform into
output. This process is repeated a number of times, as input turns into output, and output
becomes input. The output of the final stage of this multistaged manufacturing process
becomes the final product. It must be noted that the output from each stage can be very
different from the input. As a result, the final product or the output of the final stage is
often nothing like the initial input.
Each stage in this multi-staged manufacturing process can be treated as a subprocess.
Although the subprocesses can be occasionally shut down for maintenance or testing pur-
poses, they are continous processes with real-time controlling and monitoring parameters.
But between these stages, there can be certain amount of delay as material is transferred
between subprocesses. The output of one stage often does not immediately become the input
for the next stage. Due various factors such as supplier power and customer demand, pro-
duction within certain stages can be speeded up or slowed down, resulting in delays between
subprocesses. Understanding these delays can be important when performing multivariate
data analysis.
4.2 Data
The data set for web process 2 contains 1518 variables recorded every time unit for a total of
2000 observations. These variables can be either control or monitor variables. As mentioned
in a previous chapter, control variables are also referred as in-line variables, and monitor
variables can be called quality variables. Since the variables in the data set are arranged in
alphabetical order, the order of the data does not have any physical significance. In other
words, the variables from all the different subprocesses are scrambled together. In addition,
they are not separated into either in-line or quality variables nor are they grouped according
to subprocesses.
4.3 Preprocessing
A major fraction of the data set containing 1518 variables and 2000 observations are either
corrupted or missing. After unwanted variables and observations are removed, the remaining
working data set contains 1010 variables and 1961 observations, which is about 65 percent
of the original data set. Next, variables whose sample variance is zero are deleted from the
data set, because they contain no useful information. The remaining data set contains 1961
recordings of 860 variables, which include both control and quality parameters. Before ap-
plying multivariate analysis, the variables are all normalized to zero mean and unit variance
using methods discussed in section 2.3.1.
4.4 Feature Characterization
4.4.1 Quality Variables
Unlike web process 1, the quality variables for web process 2 do not record the actual physical
location of defects. Instead, the quality variables are various modified physical parameters.
As a result, no one figure can capture the quality of the final product.
4.4.2 In-Line Variables
Since there are many in-line variables, it would be very difficult to analyze them in depth one
by one. But a simple look at the behavior of individual variables over time could provide
some valuable insight before performing any multivariate analysis. The following are 10
typical types of behavior associated with the in-line variables:
Variable Behavior Over Time
0 500 1000 1500 20002 Observation a
--2 ' ' ' 'L
10 Observation 0
0 1000 100 2
0 500 1000 1500 2000
Observation U0 0oo 1000 1500 2000Observation 0
Fu -A -5
0 Soo 1000 1500 2000Observation W
Figure 4-1: Ten Types of
o
2-
0 500 1000 1500 2000Observation
o iIOIJo..U OOi o. iloloo10-o:Li-i1l-ziii0 6500 1000 1500 2000
2 Observation 6
0
E0 6oo 1000 1500 2000
Observation 0
tea
•- •0 500 1000 1500 20000 Observtion N
0 500 1000 1500 2000Observaton 0
In-Line Variable Behavior
These graphs present some interesting features with regards to the behavior of the manu-
facturing process. Each of the above 10 plots represents a group of variables with a particular
type of behavior. Almost all the in-line variables can be categorized into one of the ten types
of behavior described below:
1. Variable 1 - represents the set of variables whose values remain fairly constant except
for sharp transient outliers at certain observations.
2. Variable 2 - represents a set of variables that increase linearly and reset themselves
periodically.
3. Variable 3 - belongs to a group of variables that tend to remain constant for a period
of time before jumping to another value.
4. Variable 4 - generally low frequency quantatized behavior with sharp transient outliers.
5. Variable 5 - linear time-series behavior.
6. variable 6 - high-frequency oscillatory behavior that drifts over time.
7. Variable 7 - high-frequency periodic behavior that is confined tightly within a certain
range.
8. Variable 8 - fairly random high-frequency behavior.
9. Variable 9 - high-frequency behavior with relative small amplitudes compared to sharp
transient outliers.
10. Variable 10 - high-frequency behavior with a lower bound.
4.5 Outliers Analysis
As defined in Section 2.3.1, outliers are observations that are inconsistent with the rest of
the data. Identifying and understanding outliers in a manufacturing setting can be very
important to plant managers whose goals are to eliminate variation and to reduce defects.
The plant managers are interested in knowing the answer to the following questions:
1. Are the outliers simple extensions of the normal behavior?
2. If not, are there any physical significances behind the outliers?
3. If so, can the outliers be grouped according to these physical significances?
4.5.1 Normalized Euclidean Distance
The normalized Euclidean distance method, as explained in Section 2.3.4, is a good way to
identify outliers. In this case, this method is applied to 860 variables and 1961 observations.
The plot of the normalized Euclidean distance in Figure 4-2 shows that there are at least
two distinct populations in the data set. One group of observations, where the normalized
Euclidean distance is above approximately 1000, shows sharp and spiky behavior over time,
8000
7OOOO
4000
S4ooo
o 200 400 600 8 o000 ' 120 '14oo 1600oo 180 2000G Ob nmatlan
Figure 4-2: Normalized Euclidean Distance
while the other group of observations, where the normalized Euclidean distance is less than
1000, shows slow-moving and fairly constant time-series behavior.
In order to define outliers, it can be assumed that in a properly functioning multivariate
manufacturing environment, all the process parameters operate within a certain normal range
of values both individually and collectively. Consequently, behavior outside this certain
normal range can be categorized as outlier behaviors contributed mostly by containments.
Figure 4-2 shows that an appropriate normal range of behavior can be defined as obser-
vations with normalized Euclidean distance less than 1000, and the outliers set corresponds
to observations with normalized Euclidean distance greater than 1000. Figure 4-3 is a plot
of these two separated groups: 1) the normal set, and 2) the outliers set. The time-series
behavior of the normalized Euclidean distance of these two sets of behaviors do not seem
to be extensions of each other. The normal set exhibits fairly constant and stable behavior,
while the outlier set is transient and very unstable.
4.5.2 Time Series Model - PCA
In addition to normalized Euclidean distance, various time-series methods, such as principal
component analysis (PCA), can also be good methods to identify outliers. PCA groups to-
gether variables that are correlated with each other. Since multivariate outliers are produced
by sets of variables that exhibit the similar 'outlying' behavior at certain times, PCA should
Ofn
--
r:' , . ,r; . . .- r ?-
7000
6000
5O00
2000
1000
o
80007000
6000
95000
O00
2000
0ooo
I-
O 200 400 600 Boo800 1000 1200 1400 1600 1800 2000
Figure 4-3: Outlier and Normal Behavior Based on Normalized Euclidean Distance
be able to group together the variables contribute to the same outliers. PCA should be
more effective in grouping outliers than the normalized Euclidean distance method, because
it groups variable that are physically significant together rather than simply grouping the
observations into two populations.
Principal Component Analysis Figure 4-4 represents the first 10 principal components
calculated from the data set containing 860 variables and 1961 observations. Similar to the
plot of the normalized Euclidean distance in Figure 4-2, Figure 4-4 shows that principal
components also exhibit sharp discontinuities at certain observations where the their values
jump very sharply. A more careful look at Figure 4-4 shows that the first ten principal
components exhibit two major types of outliers.
* 1st type - Step outliers. This set of outliers is associated with the 1st and 3rd principal
components, where the values of the principal components between approximately
observation 970 and 1220 are substantially different than the values of most of the
other observations.
* 2nd type - Transient outliers. They are associated with the 3rd through the 10th
principal components, where the outlier values are very different from the rest of the
population only for very brief periods of time.
200 48 00 80 00 ro 1200 1400 1600 1800 20 00Obtse ation 0
"7ru~·s~hkut~,*c~,r;~.h+~liirirr3
Observation #
Figure 4-4: First Ten Principal Components of Web Process 2 Data Set
These two types of outliers seem to be controlled by two different set of physical param-
eters. The first type of outliers takes place when the principal component jumps suddenly
from a certain range of value to a different range of values, stays there for a period of time,
and jumps back to the original range of values. The second type of outliers are transient out-
liers that occur abruptly and for brief periods of time. The contrasting behavior of these two
types of outliers indicates that they are controlled by separate underlying physical processes.
Looking at the 3rd through 10th principal components associated with transient outliers,
we can identify two distinct groups within the transient outlier set.
* The first group is associated with the 3rd, 4th, and 5th principal components, where
their values exhibit sharp changes at approximately observations 100, 1750, and 1950
occurring with similar relative proportions.
* The second group is associated with the 7th, 8th, 9th, and 10th principal components,
where their values change sharply at approximately observation 600.
PCA with Frequency Filtering Transient outliers occur when the values of the principal
components abruptly jump to a very different value for a short period of time before returning
to the original values. As discovered from Figure 4-4, there are two kinds of transient outliers
associated with the 3rd through 10th principal components. The figure shows that the 1st
kind of transient outliers is spread out over the 3rd, 4th, 5th, and 6th principal components,
while the second kind is spread out over the 7th, 8th, 9th, and 10 principal components.
PCA collapses variables that are similar to the same dimensions. But in this case, each
of the two kinds of transient outliers are spread out over more than one dimension. One
hypothesis for this phenomenon is that the original data set is dominated by low-frequency
behavior. As a result, the first few principal components are also dominated by low-frequency
behavior, and the high-frequency transient outliers are not dominant enough to be grouped
to the same dimensions.
Since we know that the transient outliers are associated with high-frequency behavior,
one way of collapsing the transient outliers into a smaller number of dimensions is to perform
high-pass filtering on the original data set before applying PCA. The idea here is that with
the low-frequency components filtered out, PCA can group the high-frequency transient
outliers much more effectively.
High-Pass Fil f-gh-Pass F"b
_040.
o;
0.1 0.2 0.3 04 05 06 07 0.8 0.9
o08
0.6
004
0.2
0
Norm F-qouooy (0p0) Noff Fr.quwcy (vpi)
220 1-200 -
FNom FRquency (wp0 Norm FoMquo•y (PON
Figure 4-5: High-Pass Filters with Wp=0.1 and Wp = 0.3
Figure 4-5 shows the graphs of two high-pass filters utilized to remove the low-frequency
components in the original variables. Figure 4-6 presents the first 10 principal components
0 01 02 03 04 05 0 901 02 03 04 05 06 o7 a 0
Jti • 0Fii 1 -
~1 -0064
2000 -. 0 600
-so o
:0 1 w i +on
oi_, :.o .. .•... : :': ... ..,o oo00 1000 00 2000O
20
Cft0,00tjý
--
I, :: ..L... A.$ ".i ,2 - .11 M,.,j'j. 77 "p"7-1-7-irr
201
a I C'~00 15~00 2000010b..ý.Ulon a
I-00c, • " , Mo 600 1000 o 1 00 2000
40 Ob..,stio. 4
20o0 so0 1000 15OO 2000
Ob -Villn 0
-o• soo • doo • •oo 20oObucrvrtlofl #
Figure 4-6: First Ten Principal Components from 90% High-Pass Filterd Data
--2002 0 2
0 So00 1000 1500 2000
0 on
0,O 50 1000 1500 2000
100
0 500 1000 1500 2000
-o 5oo 1ooo 5oo 2000
50 Boo o~IQO 1 640iS M0 Ob50 0tl0n 0Obrlr~vllon
0 oo ~o•° gsoo 2ooo
Obllervrtlon #
20 500 1000 .1500 2000100
0 soo 1000 1500 20000 Obsorvatlon 0
-1°o 5 1000 1500 2000
50 Obsftostl2,1 0
0 50 1000 1500 2000so OblorvstlonOIL I I , 0 m il an
0 600 1000 1500 2000
ObrWstlon 0-a soo • doo ,.oo 2•ooo
Figure 4-7: First Ten Principal Components from 70% High-Pass Filtered Data
-- 0
II
soo 1ooo 15oo 20004DUý.Va(n 0
.___
1-'_ ! - 2
":" :': • -:: •- .. . .. . .. . . .. . .. ::
W00.-- n
__ r--,
-0soo 1000 oo oo 2oS0o .. o
Ob4 e vtlan w
B00 1000 14500 o I000 4 oo 2o00.b tl i
I
I - .
01·,
,1
I
calculated from variables with the lowest 10 percent frequency range filtered out. Figure 4-7
presents the first 10 principal components calculated from variables with the lowest 30%
frequency range being filtered out.
Figure 4-6 and Figure 4-7 show that PCA, obtained from high-pass filtered variables,
does remove the low-frequency behavior but does not effectively separate the different kinds
of outlier behavior. For the two kinds of transient outliers discussed in the previous section,
PCA with frequency filtering still spreads each one of them out over more than one principal
component.
The first kind of transient outlier appears in the 3rd, 4th, 8th and 10th principal compo-
nent in Figure 4-6 where the lowest 10 percent of the variable frequency range are removed.
Although the second kind appear predominately in the 6th principal component, it also
shows up in the 5th and 9th principal components. In Figure 4-7, where 30 percent of the
lowest variable frequency are filtered out, the two kinds of transient outliers appear slightly
better defined. The first kind mostly occupies the 6th principal component, while the second
kind mostly shows up in the 5th principal component.
4.5.3 Identifying Outlying Variables
From a plant manager's point of view, outliers represent shifts and changes in process pa-
rameters that can potentially effect the quality of the end product. As a result, we want
to develop methods and tools to help managers to identify the physics behind these outlier
behaviors. To understand the underlying physics, we need to determine which variables or
combinations of variables contribute to which outliers. This way, we are able to analyze
the variables and determine the causes for the outliers. In this section I will present some
methods to group the variables according their contributions to the outliers.
Transient Outliers Focusing on the transient outliers in the top plot of Figure 4-3, which
does not include the set of outliers from observation 1000 to 1200, we can see there are
mainly 4 regions where the values of the normalized distance jump up suddenly and return
quickly. These 4 regions are located approximately at observation 100, 600, 1750, and 1950.
The goal here is to determine which variables contribute to these different transient outliers
in these 4 regions. One method to find the contributing variables is to find the variables
that also experience sudden changes in values at the same observations corresponding to the
4 transient-outlier regions.
Since a manufacturing data set often contains hundreds of variables, looking at the time
behavior of each variable might be burdensome in determining the causes of transient outliers.
The following procedure is a simpler way to find the contributing variables. For a data set
X;j, where i represents the observation number 1, . . . , n, and j represents the variable
number 1, . . . , m, D is a difference matrix with dimensions (n - 1) x m, whose rows are
equal to the difference between adjacent rows of X. Dij is defined as:
D = X - X- (4.1)
Let M be a row vector of length m where the jth element represents the average of jth
column of the difference matrix D. The 4 transient outlier regions are represented by j=100,
600, 1750, and 1950 respectively. The variables that contribute to region 1 where j=100 are
variables with index i that satisfy the equation Dij >> Mj. The variables that contribute
to the other outlier regions can be determined by using the appropriate j's.
The basic method described here is to determine the variables whose greatest change
between two consecutive observations compared to the average change occur at the same
observations corresponding to the 4 transient outlier regions. Thus, we can determine a set
of variables that contributes to the transient outlier behavior in each of the 4 regions.
Results show that this method is effective in determining the variables that contribute
to the transient outliers in the different regions. Figure 4-8 is a plot of a set of identified
variables that correspond to outliers in region 1 and 4.
Eigenvector Analysis The eigenvectors associated with the principal components reveal
how much each variable contributes to the time-series behavior of the corresponding principal
components. As a result, eigenvectors associated with principal components that exhibit
outlier behavior can potentially reveal the variables that contribute most to the outliers.
* Step Outliers - Since the first principal component represents most of the behavior as-
sociated with the step outliers, the corresponding eigenvector can provide information
6r 300-- i 2
Vmr 874-D 9 2
Var 37 Var 3see
10 0ooo 30oo looo 2100Var 3Se Vmr 370
Vmr a73 Vlr s"•2 2-40 10002000Var 673 Var 3730 0opVa- 376 Var7376
0 1000 .90 100- ICH0 2000'Var070 mrM70 0oloo o~ oo no
Figure 4-8: Variables Identified that Contribute to Transient Outliers in Region 1 and 4
as to the contributing variables. Figure 4-9 represent plots of the first principal compo-
nent and the corresponding eigenvector. As discussed in Section 4.5.1 , we can see that
10
0
2-10
-20
-30
0 200 400 600 800 1000 1200 1400 100 100Observatilon 0
Figure 4-9: The First Principal Component and the Corresponding Eigenvector from Process2 Data
the 1st principal component is dominated by the outlier behavior from approximately
observation 1000 to 1200. The corresponding eigenvector shows that 122 variables are
weighted significantly more than the rest of the variables. These 122 variables are the
main causes for this type of outlier.
4~.
20 - · · · · · · · ·
-bC) · · · --· ·
jkcC"yU,
--2000
* Transient Outliers - With the 122 variables that contribute to the step outliers removed,
the eigenvectors of the correlation matrix and the associated principal components are
calculated for the remaining 738 variables and 1961 observations. Figure 4-10 are the
first 10 principal components, and Figure 4-12 are the corresponding eigenvectors.
,to,
50 .40o oo 100ooo 160 2000 o 500 1000 1500 2000
. .0 O..0arti•, , .40 .t. n 00- I Z20 -
...ff... ° ~ 2~-20,_ 0 7 --10
0 .5 o0 1000 1 . ooo 0 500 1000So 1500 200020 ThbeFrsvti0Tn P i 0ompobentso,, f 8 Vo moo 1000 1500 2000 0 5 00 1000 10 2000Ob2tarvti2on Ob20 0 tb~on SF0:-20- -20h n·tsl-4- So - 1000 '1500 2 -- A ,; Soo 1 500 2000OClrr"Mtla 0 4 3ftratian &
Fiue41:TeFrtTnPicpa opnnsfo 3 aibe
0,,G~ a wItrlabloJA IN~ao 200 400 oo oo00
Varibl Number
d
0 200 400 500 5000-O2 'doobI
0 U1W" I0 200 400 000 500o 20o 4oo 6°° zoo
0 2 Viable Number
0 200 400 00 00Vlriable Number
0.
o 2oo 400 500 500
o •oo 400 Soo Soo
0.1 V 9j
o 200 400 5 oo oo0.2 Valrbia tle o.tbs
G
o 200 400 00 8oo000. Variable b Numlb5
0 200 400 6b00 S00
S 200 400 800 oo0Vriable t~k.Mbtl
Figure 4-11: The First Ten Eigenvectors from 738 Variables
Since the two kinds of transient outliers are spread out over the first 10 principal
components, the variables that contribute to these two kinds of transient outliers are
spread out over the 10 eigenvectors. As a result, Figure 4-12 show that all the variables
are weighted fairly evenly in the determination of the principal components, and it is
hard to determine which variables contribute most to the outliers from examining the
eigenvectors. Eigenvector analysis is not very effective in isolating variables when the
outlier behaviors are spread out over many principal components.
4.6 Variable Grouping
This section addresses the question: how can variables that are related to some common
physical process be effectively grouped together? In order to group or separate variables, it
is very important to have some understanding of the underlying physics of the manufacturing
process. In web process 2, the manufacturing process is divided into subprocesses, where the
output of one subprocess becomes the input of another. The data set contains both control
and quality variables from all the subprocesses. It is reasonable to hypothesize that variables
that are related to the same subprocess are more correlated with each other than variables
from different subprocesses.
The following 3 subsections present 3 methods of variable separation. Section 4.6.1
presents principal component analysis with a focus on examining the associated eigenvectors
calculated from the correlation matrix of the original data. Section 4.6.2 and Section 4.6.3
introduce two methods where some of the noise in the original data set are removed before
performing PCA. Based on the hypothesis that variables related to the same subprocess
are more correlated than variables from different subprocesses, Section 4.6.2 shows how a
more robust correlation matrix can be used to calculate PCA. In Section 4.6.3, variables are
frequency-filtered prior to the calculation of the correlation matrix. Results will show that
these two methods are more effective than standard PCA in capturing the variables groups
within the original data set.
4.6.1 Principal Component Analysis
Principal component analysis groups variables together based on the variables' correlation
with each other. Recall from Section 2.6.3, the ith principal component is a weighted linear
combination the original set of variables obtained from the following relationship:
zi = w!X (4.2)
where w! is the eigenvector associated with the ith principal component, zi, and X is the
original data set.
Equation 4.2 shows that the eigenvectors characterize the weights of variables in the cal-
culation of the principal components, and the eigenvectors reveal which variables contribute
the most to the time behavior of the corresponding principal components. As a result, exam-
ining the eigenvectors obtained from the correlation matrix of the original data can provide
important information as the how the variables are grouped.
-0 200 40 60 o00 - 0 200 400 60 6oVariable Number 0.1 Variable Number
u 0.2O 7l0 200 400 500 000 00 400 600 600
0.2 2 Variable Number 0 0.2 Variable Number
Figure 4-12: First 10 Eigenvectors of 738 Variables
-0. -ht0.05 0 0.05 0. 0.159
- V.1 - b oNumOber .....
-0.1 -0.5 0 0.05 0.1 0.15
-0.1 -0.05 0 0.05 0.1 0.15Variable Numberht1 -18
-0.1 -0.5•0b•N O, 0.1 0.15
-0.1 -0.05 0 0.05 0.1 0.15Variable Number
-0.1 -0.06 0 0.05 0.1 0.15Variable Number
1On Mh 70
10a10*a -0.1 -005 0 0.05 0.1 0.15Variable Number
11o 0.1 -0.05 0 0.06 0.1 0.15Variable Number
-0.1 -0.05 0 0.05 0.1 0.15Variable Number
Va~rible Number
Figure 4-13: Histograms of the First 10 Eigenvectors of 738 Variables
10
11o
lO0e 1oO
Observation Figure 4-12 shows the eigenvectors associated with the first 10 principal
components obtained from the correlation matrix of 738 variables, and Figure 4-13 are the
histograms of the eigenvectors. In Figure 4-13, the symbol 'ht' represents the height of the
middle bar. Since the sum of the area under the bars for each eigenvector is the same, the
height of the middle bar is a good indication of how widely the values of the eigenvectors
are distributed. The plots of the first 10 eigenvector and the associated histograms reveal
that almost all the variables are weighted towards their respective principal components. No
single eigenvector is dominated by a small number of variables.
Interpretation Based on the assumption that variables from the same subprocess are
highly correlated due to their common link to the underlying physics of the subprocess,
variables from different subprocesses should be less correlated. Since the data set contains
variables from many independent subprocesses, it is expected that many of these variables
will be uncorrelated with each other. Consequently, for each eigenvector calculated from
the correlation matrix, there should be variables that make significant contribution to the
associated principal components, and there should also be variables that make little or no
contribution to the associated principal components.
Contrary to the above hypothesis, Figure 4-12 shows that almost all the variables in the
first 10 eigenvectors contribute to some extent to the time behavior of the first 10 principal
components. This is not consistent with the initial assumption that variables from difference
subprocesses should not be correlated and, thus, should not all contribute to the same
principal components. From this, we can conclude that there is a lot of noise in the original
data set, resulting in accidental correlations between variables from different subprocesses.
Thus PCA does not groups the variables as well as it can. Methods should be developed to
improve the signal-to-noise ratio of the eigenvectors so that PCA can better categorize the
variables collected from different subprocesses.
4.6.2 PCA with Robust Correlation Matrix
Ideally, only variables that are from the same subprocess or that are controlled by the same
underlying physics should contribute to the same principal component. Consequently, it can
be assumed that the variables from different subprocesses are correlated mostly by accident,
and the correlation between them can be considered as noise. One method to improve the
signal-to-noise ratio in the eigenvectors of the correlation matrix is to create a more robust
correlation matrix by eliminating some of the accidental correlations between variables.
10,,
A
n dex
Figure 4-14: Magnitude of Correlation Coefficients of 738 Variables in Descending Order in(a) Normal Scale, (b) Log Scale
Method Figure 4-14 shows the magnitude of the correlation coefficients, rij, calculated
from the 738 variables arranged in a descending order both on a normal scale and a log
scale. In order to create a more robust correlation matrix, where some of the accidental
correlations are removed, let E be the cutoff correlation coefficient, where rijjI < E is set to 0,
and Irij I > c maintains its value. Principal component analysis is performed using this more
robust correlation matrix to determine the grouping of th variables.
Results
Cutoff = 0.06 Figure 4-15 are the first ten eigenvectors calculated from the robust cor-
relations matrix where c = 0.06, and Figure 4-16 are the corresponding histograms.
Cutoff = 0.10 Figure 4-17 are the first ten eigenvectors obtained from the robust corre-
lations matrix where correlation coefficients below 0.1 are set to 0, and Figure 4-18 are the
corresponding histograms.
I
r
4 5 I. 16n
0o.1
0 200 400 600 800O . , BooVariable Number
-0.2
0,2 Variable Number
I 0
0- 200 400 600 o800
0 200 400 600 800
Variable Number
-02
Variable Number
0.1
0 200 400 600 8000.1 Variable Number
0.2 Variable Number
0 200 400 600 800
0.2 200 400 600 Boo
-0 2 Variable Number
-- 0 200 400 600 oo
0.2 Variable Number
Variable Number
10 Eigenvectors of Calculated from Robust Correlation Matrix with Cutoff
j10,-0.1 -0.05 0 0.05 0.1 0.15
Variable Number
102 ht 103
-0.1 -0.05 0 0.05 0.1 0.15
V.ariable Number
10' -0.1 -0.05 0 0.05 0.1 0.15Variable Number
-0.1 -0.05 0 0.05 0.1 0.15Variable Number
-0.1 -0.05 0 0.05 0.1 0.15Variable Number
L 102
ht 71
-0.1 -0.05 0 0.05 0.1 0Variable Number
1 ht - 77
-0.1 -0.05 0 0.05 0.1 0
10
Variable Number
10 L-0.1 -0.05 0 0.05 0.1 0
Variable Number
F10
-0.1 -0.05 o 0.05 o.1 0.Variable Number
Figure 4-16: Histograms of the first 10 Eigenvectors Calculated from Robust CorrelationMatrix with Cutoff = 0.06
Figure 4-15: 1st= 0.06
15
.15
f·L
).
0 2o 0 400 o -0 2 00
0 200 400 o00 800 O 200 400 o00 800Variable Number 0.1 Variable Number
-0.2
-0 200 400 00 800 00 200 400 00 800Variabl Numbr0 Variable Number
0 200 400 600 800 o 200 400 600 800Variable Num•b•r 02 Vaable Number
0.2 0.10 200 400 600 800 - 200 400 600 S002 Varible Number Variable Number
=0.1
10P Lht -103 ht -I n M-10510 10,
10- 28 10-0.1 -0.05 O 0.05 0.1 0.15 -0.1 -0.05 O 0.0 0.1 0.15
Vriablrle Number Va~rible Number
10 [ e 2 10 h O10 10'
-0.1 -0.05 0 0.0 0.1 0.15 -0.1 -0.05 ON 0.0 0.1 0.15V ariable Number Varble Number110o 110
10'i10'
-0.1 -0.05 0 0.05 0.1 0.15 -0.1 -0.05 0 0.05 0.1 0.15Variable Number Variable Number~102 hS- o101 t-0
10 10lo" -V b0r -BA1 .M O 6u OO O1 0 15-0.1 -0.05 0 0.05 0.1 0.16 -0.1 -0.05 0 0.05 0.1 0.15
Variable Number Variable Number10"I ht 127 1 .1, ht - 1310 10'lei In
10 fl. 10-0.2 - 0.1 .- 0.1 -0.05 0 0.05 0.1 0.15
Variable Number Variable Number
Figure 4-18: Histograms of the first 10 Eigenvectors Calculated from Robust CorrelationMatrix with Cutoff = 0.10
· r_
Cutoff = 0.15 Figure 4-19 are the results of eigenvectors calculated from the robust
correlation matrix where E =0.15, and Figure 4-20 are the corresponding histograms.
0 200 400 600 8-0. O . .
S Variable Numbor1
2-o.2 0o o
0. 0Varlbl. Numbe-0r
0 200 400 600 0o1Variabl. Number
0 200 400 600 800Varkble Number
o400 sol 800Variable Number
. Variable Number
00 Va.ab.e Number
Variable Number
Figure 4-19: First 10 Eigenvectors calculated from Correlation Matrix with Cutoff at 0.15
ht - 136
-0.1 -0.05 0 0.05 0.1 0Variable Number
-- 0.1 -- 0.05 0 0.05 0.1 0Variable Number
-0.1 -0.05 0 0.05 0.1 0Var-abla Number
-0.1 -0.05 0 0.05 0.1 0Variable Number
V-IrWab N~umber
.15
.15
.15
15
]s
Ih-0.1 -0.05 o 0.05 0 1 0.15
VariableONuOber
Variable Number
10°
-0.1--0.05 0 0.05 0.1 0.15Variable Nunbar
-0.1 -0.05 0 0.05 0.1 0.15vaftabla N uber
Figure 4-20: Histogram of First 10 Eigenvectors calculated from Robust Correlation Matrixwith Cutoff at 0.15
Cutoff = 0.18 Figure 4-21 are plots of the first 10 eigenvectors obtained from the robust
correlation matrix with cutoff coefficient equals 0.18, and Figure 4-22 are the histograms of
the first 10 eigenvectors.
102,-to'
SI,10,10o
•1oz
10oIla• IO'
2io10
• °|
o.9 o.1
-0.1 200 400800 -o 2o 40 o00 00
Varlabla Numviber VrIable Number
0 200 400 500 500 0 OO 400 000 5000 200 400 500 800 0 200 400 000 500V0rleble Number .VerLeblo NUmber
0 200V- 400 500 -500 20 400 800 500V0rlebi Number Vmrvibie Number
Figure 4-21: First 10 Eigenvectors calculated from Robust Correlation Matrix with Cutoffat 0.18at 0. 18
110
10
-0.1 -0.05 0 0.05 0.1 0.15
10-o -0.1 -o0.00 o 0.o0 0.. 0...
V-6..fta bdurt
-0.1 -O.O6 O 0.0 0.1 0.10
-10
-0.• -0.r05 0 0.05 0.1 0.15
loo
-0.1 -0.05 0 0.05 0.1 0.15
VIl*b N h umbr 1
-0.1 -0.05 0 0.05 0.1 0.15110
-0.1 -0.05 0 0.05 0.1 0.15ve"'U'a "',Z
Figure 4-22: Histogram of First 10 Eigenvectors calculated from Robust Correlation Matrixwith Cutoff at 0.18
69
Interpretation The figures of eigenvectors obtained from correlation matrices with differ-
ent cutoff values show that a more robust correlation matrix is more effective in grouping
correlated variables. As the cutoff values increase, eigenvectors show that the variables that
contribute to the associated principal components are weighted more, while the variables
that do not contribute to the associated principal components are weighted less. In ad-
dition, the histograms of these eigenvectors show that the distribution of the eigenvectors
become increasingly narrower as the cutoff correlation coefficients are increased. As was
discussed before, a narrower distribution means a taller middle bar. The average height of
the middle bar of the first 10 eigenvectors increases from 86.0 to 98.1, to 113.3, to 133.7, and
to 137.8 as the cutoff level increases from 0 to 0.06, to 0.10, to 0.15, and to 0.18. A narrower
distribution means that for each eigenvector, only a small number of variables contribute
to the associated principal component, while most of the variables make minimum or no
contribution to the associated principal component. This is consistent with the hypothesis
that variables from different subprocesses should not all contribute to the same principal
components.
Comparison Figure 4-23 is a comparison of 5 eigenvectors calculated from 5 different cor-
relation matrices obtained from different cutoff values. The plots show that the eigenvectors
calculated using the more robust correlation matrix groups the variables much better than
the eigenvector calculated using the original correlation matrix. One can see that the signal-
to-noise ratio of the values of the eigenvector increases as the cutoff correlation coefficient
level increases.
The histograms of the eigenvectors in Figure 4-24 also indicate the improvements in the
signal-to-noise ratio. The figure shows that the distributions of the eigenvectors get narrower
as the cutoff coefficients are increased. The height of the middle bar increases from 78 to
270 and the cutoff coefficient increases from 0 to 0.18. This means that as the correlation
matrix becomes more robust, accidentally correlated variables are weighted less, while the
significant variables are weighted more.
-o0S0 100 20 30Oe 40 500 800 700
0.2 Observation 7
0 100 200 300 400 500 800 700Observation 5
0.2-0.2
o 100 200 300 400 500 600 700Observation 0
-0 .0100 200 300 400 500 600 700
(b) Robust Correlation Matrix with CutObff0.06 (c) Robust Correlationrvati Matrix with Cut-on0•! o.1 100 200 300 400 500 S0o 700Observation 0
off=0.10, (d) Robust Correlation Matrix with Cutoff=0.15, (e) Robust Correlation Matrixwith Cutoff=0.18
with Cutoff=O.18
-0.2102
-0.2
--0.2
10'
-0.2
10,-0.2
mht - 7 i
-0.15 -0.1 -0.05 Ou r.05 0.1 0.15-0.15 -0.1 -0.0 VariableONumber.0 o O. 5
ht - 10 2
--. 1 2 --. 1 0.o5VarlblONumbr0.OS 0.1 0.15
ht - 123
-0.15 -0.1 -0.05 VadblONum r 0.05 0.1 0.15Variable Number ht - 22-0..15 o -0.05 1 0. 0.05 .0.1
Variable Number ht 1 270
5 -0.1 -0.05 0 0.05 0.1 0.16Va~rible Number
Figure 4-24: A Comparison of Histograms of the Eigenvectors
4.6.3 PCA with Frequency-Filtered Variables
Figure 4-10 in Section 4.6.1 shows that with the exception of transient outliers, the first few
principal components of the 738 variables are mostly dominated by low-frequency behav-
ior. In Section 4.5.2, attempts were made to isolate transient outliers by performing PCA
on variables with the low-frequency components removed. In the previous section, it was
shown the a robust correlation coefficient matrix can be effective in separating variables and
reducing noise in PCA. In this section, we hope the PCA with frequency-filtered variables
can also remove the noise components in the original data set and more effectively group the
variables.
Method Attempts are made to perform principal component analysis after the original
set of variables are frequency filtered. The idea is that we hope that noise can be filtered
out in certain frequency bands, so that PCA can show the same promising results as PCA
with robust correlations with regards to grouping variables.
Frequency-Filters Figure 4.6.3 are samples of a high-pass filter and a band-pass filters
used to remove noise in the original data set.
HDh-Ps F#O .v~p_ F
1
D.;C.C€ ·
-No.m Fnm.0-o y (
4o
(a)
Figure 4-25: (a) High-Pass Filter with Wp =
(b)0.3 (b) Band-Pass Filter with Wp=[0.2, 0.4]
dI · )O 0.1 02 0.3 0.4 .5 Om 47 OA 0.-
N_ f:Mq1WWY Z'0 0.1 02 0.3 0.4 0. 0 0 7 07 oJ O.
0 0 0.1 0.2 013 OA 0.5 0.6 O 0.8 0J I
j
Results For variables that are high-pass filtered with Wp=0.1, Figure 4-26 illustrates
the first ten eigenvectors calculated from the variables' correlation matrix. Figure 4-27 are
histograms of the eigenvectors.
o 200 400 60 600O
0.2 Variable Numb 0r
S0.2 iVlr Clblt Number
-0.2
Figure 4-26: First 10 Eigenvectors Calculated
-I.1 ae "t - 'so nO'
1.0aleumbr
S -0. o.1
-0.1VaablONU O.VartblaeNumbr.1
•oo
VokrlmblO NumbO1•O m h' • "b
* 1 ;·'z1 , 1 - 1x o.:
o ROO 400 S 0ooV.Or:la N. b.:r
O 0 400 000 So00o
o 200 400 Soo o0o
from High-Pass Filtered (Wp=0.1) Variablesoo
from High-Pass Filtered (Wp=0.1) Variables
-o.1 VarablhONumt - 1
-. •Verlablr Numb .1110r
-0.0. o O'V.11.0g. Numb~r
Figure 4-27: Histograms of First 10 Eigenvectors Calculated from High-Pass(Wp=0.1) Variables
Filtered
Figure 4-28 are the eigenvectors calculated from the correlation matrix of high-pass fil-
tered variables with Wp=0.3. Figure 4-29 are the histograms of the eigenvectors.
Figure 4-30 are the eigenvectors calculated from the correlation matrix of high-pass fil-
tered variables with Wp=0.4. Figure 4-31 are the histograms of the eigenvectors.
-- 200 400 600 800 0 200 400 600 Boo0.2 Vriab.e Num-r 0.5 V.b Vab Number
S6 Variabe N.umber Variable Number0. o 2o0 400 Soo S0o 0 200 400 S00 800
0. Variable Number Variable Number
0 200 400 600 800 Eo 200 400 60oo 0e0.5 Varinble Numbe• Variable Num-br
Figiubl N.4-r VFs1E e e r NuCtHere
Figure 4-28: First 10 Eigenvectors Calculated from High-Pass Filtered (Wp=0.3) Variables
... .. ....o . . .. °r . ...S10 ht -1 70
-0 2 -0.1 0 o.1 0.2
Vzrialble Number
0, 1°11 o
[1 0.210e -0.2 -0.1 o o. 0.2Variable Numberht - 170
~~07
1°,N/o . "u [1... . - .i •, .•, e ,.N : ...
ht 1 84
1 -0.2 -0.1T 0 .1 0.2
0.2 -0.1 0 0.1 0.2VI r ble Number
02 hit 1 Ss-0 2 -0. i bON b
0 "10.2
Figure 4-29: Histograms of First 10 Eigenvectors Calculated from High-Pass Filtered(Wp=0.3) Variables
0 2 I R 7 7 V 'Ib e N u br T 0 5V e l b e N m r
-0.0 200 400 o000 o00 200 400 800 Bo000
V.ariable Numb•r Vrl.ble Num.b.r
S200 400 000 000 0 200 400 000 100010 oIO 200 4100 0900 800 - 0 200 4000 000 8000
0. Variablae Number Vartafble Nurrmber
1..1.0~t]0 200 400 600 000 0 200 4o 00 00 oo000.0 200 41000 so Boo 0 200 400 0 0
Vaerable Number Variable Number
Figure 4-30: First 10 Eigenvectors Calculated from High-Pass Filtered (Wp=0.4) VariablesS2 400 Soo -0 0 20 400 Soo BooVmrlabla M.Mbbro Variabla Numbor
Figure 4-30: First 10 Eigenvectors Calculated from High-Pass Filtered (Wp=0.4) Variables
10ht 21
... . .-1r.d.o u o• ..
-0.2• -- o. ... 0. 0o...Vratý, u br".10 * '
10'
lo°
-0.2 -0.1 0 0.1 0.2Vmrlmblý "umbrr10, 4 ý h 40~* 10, rorn
-0.2 -0. 0 . 0.2
0.2z -0.1~ 0 0.1 0.2VlrtrAMw Mumbrr
.10. h,. . N
O -. r .... IuVnl.2Vmr~ablo Mumbrr
roO ht - 2014
10* A f
-0.2 -0.1 0 0.1 0.2
rd ht - 3-40
-0.2 0.1 a 0 0.1 0.2-0.2 -0.1 0 0.1 0.2
V al ht a sb
..•., 1Nu mb n
0. 2 , u -1 0.~_M,
-t oo n I-0.2 -0.1 0 0.1 0.2.Vl m, a Numkr o
--0 "t -- "ai
-02 -01 0 0-" 0.2~.b#ý Pj .,,brr
Figure 4-31: Histograms of First 10 Eigenvectors Calculated from High-Pass Filtered(Wp=0.4) Variables
02.2. 0.2.
Figure 4-32 are the eigenvectors calculated from the correlation matrix of band-pass
filtered (Wp=[0.2, 0.4]) variables. Figure 4-33 are the histogram of the eigenvectors.
O 200 40o o 00 800
O- j I0.2 VadA umber
0.2
-0.20 2Variable NumberO 800o 200 400 .00 8000.2 V..bl.. .Num
o 200 400 Soo 8BooVariable Number
A
0.2 Vraibe MNniib-er
0 200 400 600 SooVar0able Numbar
'''I
IO 200 400 o00 6oo
0. Vrd.14t
O
0 o
200 400 600 8o00Variable Number
0.2 0Variable Number
02
0 200 400 00 800
O0. 20Variablh Numbeo
0 O jý.j~a."gk
Figure 4-32: First 10 Eigenvectors CalculatedVariables
-0f.r 2B0 400 600 8e0Varlable Number
from Band-Pass Filtered (Wp=[0.2, 0.4])
-... -... Var ..... 2 .... 2 . .
lo°
,o·,o hl-137-- 0.2 -0.1 0 O.1 0.2Variable Number
so102 Vo 0.2
-0.2 0*V 0 0. 1
0.2
.... arue Numbe2 to t t --40
loO
_0_1 -0. 0 0"' 0.2V-i-ebl. P4.-t-
M 02ht -1~34
iE0-02 --0.11 0 0.1 0.2
VVriebl. u4-t-
1,.2 ht - 1.40
-0.. -0 1 11 0. 0.2VeIeblý fNumb-r
Figure 4-33: Histograms of First 10 Eigenvectors(Wp=[0.2, 0.4]) Variables
Calculated from Band-Pass Filtered
Figure 4-34 are the eigenvectors calculated from the correlation matrix of band-pass
filtered (Wp=[0.2, 0.3]) variables. Figure 4-35 are the histogram of the eigenvectors.
Interpretation The figures of eigenvectors obtained from the correlation matrices of high-
pass and band-pass filtered variables show that PCA with frequency filtered variables is also
-0.2 ~ ~ t~ -o1 a i bs o u b I100--0.2 -0' 0.1
-02 - .VariableoNumb o.10 .2,M - 3rlo 1
.... ... ri.... Eu...o.1-=o0.2 -0-1 0 0.1-Vsrlblo
Numb-r10 2 M h - 121
•,oo n I--0.2 -0"'-IfbfýNum-ý'
.... ... ~ ~ h -.... o. ...o.w1010
-- 0.2 -- o.1 0 0.1vmrlmbfý Numb-r
tht - 04
ti)
--0.2 -01 0 0.1Var ibleNumbo
v.• I I
0.2 O" T0 200 400 00 oo00 0 200 400 600 600
Var°able Number 02 RVa• b Num•r O
7o 1 Tri Fo !o 200 400 600 600 0 200 400 600 600
0.2 Varibla Number 0.2 able Number0.2 0.2
-0 200 400 00 6200 400 600 000
S VVariables Number Varibl NumberooI IM .-0.2 - Va-" ible Numb. 0" " -- " Vrr"b"Num rc-0.2 . -0.
0 200 400 600 00 0 200 400 o00 00S- -Varable Number 0- Variabl Numb.
Figure 4-34: First 10 Eigenvectors Calculated from Band-Pass Filtered (Wp=[0.2, 0.3])Variables
110 110
V.riable Number Variable Number
ht10 110 t c
10 110-0.2 -0.1 0 0.1 0.2 -0.2 -0.1 0 0.1 0.2
Variable Number Variable Number
Elk nWII-02 -0.1 0 0.1 0.2 -0.2 -0.1 0 0.1 0.2
-0.2 -01 0 01 0.2 -02 -0.1 0 0.1 0.2Varlable Number Variable Number
Figure 4-35: Histograms of First 10 Eigenvectors Calculated from Band-Pass Filtered(Wp=[0.2, 0.3]) Variables
effective in grouping correlated variables together. The histograms indicate that the distri-
butions of these eigenvectors are much narrower than the distributions of the eigenvectors
associated with the correlation matrix of the original data. The average heights of the middle
bars of the first 10 eigenvectors associated with the filtered variables obtained from filters
with normalized pass-band frequency Wp = 0.1, 0.3, 0.4, [0.2, 0.4], and [0.2, 0.3] are 155, 237,
230,128, and 114 respectively. Compared to an average middle bar of 86 for the distribution
of the first 10 eigenvectors of the original data set, the distribution of the eigenvectors from
frequency-filtered variables are much narrower. Consequently, it is reasonable to state that
by removing the noise in the original data set, PCA with frequency-filtered variables im-
proves the signal-to-noise ratio of the eigenvectors, where significant variables are weighted
more, and accidentally correlated variables are weighted less.
Chapter 5
Conclusion and Suggested Work
This thesis presented various methods for analyzing multivariate data sets from continous
web manufacturing processes. The analysis techniques described in Chapter 2 were applied
to two sets of data sets from two different web processes in Chapter 3 and Chapter 4. These
analysis techniques combined with an understanding of the physics of the manufacturing pro-
cesses can produce insights into information-rich data sets. Experiment results show that
both normalized Euclidean distance and principal component analysis are effective in sep-
arating the outliers from the main population. Correlation analysis on Poisson-distributed
defect densities shows the difficulties in determining the true correlation between variables
when the signal-to-noise ratio of the underlying processes are unknown. Principal compo-
nent analysis is a good way to determine the existence of linear relationships between sets
of variables. 'Based on the hypothesis that variables from the same subprocess are more cor-
related than variables from different subprocesses, both principal component analysis with
robust correlation matrix and principal component analysis with frequency-filtered variables
are effective in grouping variables.
Hopefully, the results of my experiments can lead to more research in the area of mul-
tivariate analysis of manufacturing data in the future. Other multivariate methods can be
explored to identify and to treat outliers. In addition, mathematical models can be built to
determine the effects of Poisson noise on the calculation of correlation between processes.
Furthermore, mathematical methods can be developed to quantify the effects of non-linear
operations on the correlation matrices on the removal of noise and on the effectiveness of
grouping variables. Combining a solid understanding of the underlying physics with a mas-
tery of analysis techniques can lead to tremendous progress in the area of data analysis of
manufacturing data.
Bibliography
[1] M.R. Anderberg, Cluster Analysis for Applications, Academic Press, New York, 1973.
[2] T.J. Derksen, The Treatment of Outliers and Missing Data in Multivariate Manufactur-ing Data Massachusetts Institute of Technology, Department of Electrical Engineeringand Computer Science, 1996.
[3] B.B. Jackson, Multivariate Data Analysis, Richard D. Irwin, Inc., Illinois, 1983.
R.J.A. Little, Statistical Analysis With Missing Data, John Wiley & Sons, Inc., NewYork, 1987.
[4] M.A. Rawizza, Time-Series Analysis of Multivariate Manufacturing Data Sets Mas-sachusetts Institute of Technology, Department of Electrical Engineering and ComputerScience, 1996.
[5] A.C. Rencher, Methods of Multivariate Analysis, John Wiley & Sons, Inc., New York,1995.