data mining manufacturing data dave e. stevens eastman chemical company kingsport, tn

Data MiningManufacturing

DataDave E. Stevens

Eastman Chemical Company

Kingsport, TN

Presentation Outline

• Intro: Data Mining Manufacturing Data

• Data Preparation

• Principal Component Analysis

• Partial Least Squares

• PLS Discriminate Analysis

Manufacturing DataThen and Now

• 40 Years Ago - Few Measurements - Temp, Press., Flows• Today - Many Measurements - Very Often - Creates Large Data Sets• Purposes For Measuring - Process “State” - Relationships (X, X to Y) - Classification - Optimization

Concerns With Current Manufacturing Data

• Dimensionality: (Large)

>1000 process variables every few seconds

>10 quality variables every few hours

Data Overload - Analyst concentrates on only a few variables and ignore most of the information!

• Collinearity: Not 1000 independent things at work. Only a few underlying events affecting all variables. Variables are all highly correlated.

• Noise:

• Missing Data:

Multivariate Data Concept

********

**

****

**

**

**

**

****

****

**

********

**

**

**

**

****

****

**

**

**

**

BreakLoad Control Chart

Elongation Control Chart

Is This Process In Control?

*

Data Preparation• Data collected in a Process Data Historian will have Process

Up and Down Times recorded from the instrumentation• Need a software tool that will permit easy methods to clean

the data and do initial Exploratory Data Analyses• JMP Software

– Interactive Graphing– Removal of Outliers

• Graphically or Variable Selection Criteria– Join and/or Subset Data Tables– Statistical Analyses

Principle Components

AnalysisUnderstanding Relationships Between Process Variables

Principle ComponentAnalysis

• Principle Component Analysis is a Projection Technique• Raw data are first “Centered” and “Scaled”• Each Principle Component represents a direction through

the data that captures the maximum amount of raw data variation

• For each Principle Component (a), new data values are generated for each obs. (i) which are a linear combination of the raw X variables (k):

ti,a = ba,1*Xi,1 + ba,2*Xi,2 . . . ba,k*Xi,k for each obs. i

Where the b’s are loadings (-1 to 1)• Increasing number of Principle Components represent less

and less raw data variation

Principle Component AnalysisFundamentals

2nd PC

1st PCProjections

X1

X2

X3

PCA: Scores

x1

x2

x3

1st PC

2nd PC

Obs. i

ti,1

ti,2

The scores tia (observation i, dimension a) are the places along the component lines where the observations are projected.

PCA: Loadings

x1

x2

x3

The loadings pak (dimension a, variable k) indicate the importanceof the variable k to the given dimension. pak is the direction cosine(cos of the given component line vs. the xk coordinate axis.

1x1

x2

x32

3

1st PC

Cos(X/PC

PCA Example

• 10 process responses obtained on each observation

• Data represented weekly process response averages

• Data spanned 10 months• Objective: Determine if the system

was stable.

Process ShiftJune 30 (5_30)

PCA Score Plot

PC #2

PC #1

Loadings PC#2

Loadings PC #1

X3

X7

X2X4

X8

X6

X9

X1

X5

X10

PCA Loadings Plot

Process ShiftJune 30 (5_30)

PC #1

PC #2

Relative to processshift, X1 and X5 werehigh in value and X4

and X8 were low invalue. Pos. Corr. Vars.were X1, X5 and X4, X8

Neg. Corr. Vars. wereX1, X5 to X4, X8

Process variable X1 increased in value when the system shifted from the left side to the right side on the PCA Score plot

Variables X1 and X5 were positively correlated

PartialLeast Squares

TechniqueUnderstanding Relationships

Between Process & Response Variables

Partial Least Squares Fundamentals

X Space Y Space

PlanesProjections

X1

X2

X3

Y1

Y2

Y3

TA Filter Example

• Objective: Relate Filtrate, TA Catalyst and Dryer Temp to Filter Speed, Vacuum, Wash Acid, Weir Level, Nash Discharge Pressure and Feed Tank Temperature– Keep Filtrate High, TA Catalyst Low

• Data: 12 Hour Averages from PI collected over a four month period

TA Filter

TA Filter Relationships

Catalyst

Higher filter speed and vac. pressure increased the filtrate flow and catalyst content but lowered the dyer temp.Higher weir level, nash discharge pressure and Op tank temp increased filtrate flow. Wash acid flow had no driving effect on the responses.

PLS Results• Obtain Weight Plots (Previous Slide)

– Shows the inter-relationships between the Xs and Ys

• Obtain Regression Coefficients– Can be used to generate response surface plot

• Display Variables Important to Prediction (VIP)

• Display Residual Plots and Distance to the Model Plot

CorrelationDoes Not

Always MeanCausation

PLS DiscriminateTechnique

Determine What Drives Data Groups To Be Different

Objective• Given groups of data from a particular process,

determine what makes the groups different with respect to the given measurements.

• Example: TA %T

– Measurements: 4-HMB, TMA, TPAD, 4-HBA, 4-CBA, IPA, BA, PTAD, p-TA, 2,7-DCF, 2,6-DCF, 4-4-DCB, 3,5-DCF, 9-F-2-CA, 9-F-4-CA, 2,6-DCA, 4,4-DCS, L*, a*, b*, .1%, .9%, Mean, %T

– Daily Numbers

– Data taken from Convey Line #1 and #2

PLS Discriminate Analysis

High %T

Low %T

What Measurements Separated the Groups?

The high %T group ($DA1) was high in %T, 0.1, Mean and L. The low %T group ($DA2) had severalmeasurements that were high in value and were positively correlated (see next slide for details).

2

The low %T group ($DA2) had several variables that were correlated and high in value: 4 4’-DCS, 4-CBATMA and p-TA

Computer Software

• JMP Software– http://www.jmpdiscovery.com

• SIMCA-P from Umetrics– http://www.umetrics.com

data mining manufacturing data dave e. stevens eastman chemical company kingsport, tn

Documents