complexity challenges to the discovery of relationships in...

37
Complexity Challenges to the Discovery of Relationships in Eddy Current Non-destructive Test Data CPT John R. Brence United States Military Academy Donald E. Brown, PhD University of Virginia

Upload: others

Post on 08-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Complexity Challenges to the Discovery of Relationships in ...helper.ipam.ucla.edu/publications/sdm2002/sdm2002_4008.pdf · Complexity Challenges in Eddy Current NDT Data Approach

Complexity Challenges to the Discovery of Relationships in Eddy Current Non-destructive Test Data

CPT John R. BrenceUnited States Military Academy

Donald E. Brown, PhDUniversity of Virginia

Page 2: Complexity Challenges to the Discovery of Relationships in ...helper.ipam.ucla.edu/publications/sdm2002/sdm2002_4008.pdf · Complexity Challenges in Eddy Current NDT Data Approach

2Complexity Challenges in Eddy Current NDT Data

Outline

• Background Information• Eddy Current Non-destructive Tests (NDT)• Approach• Algorithms• Results • Interpretation of Results• Conclusions• Future Research• Questions

Page 3: Complexity Challenges to the Discovery of Relationships in ...helper.ipam.ucla.edu/publications/sdm2002/sdm2002_4008.pdf · Complexity Challenges in Eddy Current NDT Data Approach

3Complexity Challenges in Eddy Current NDT Data

Background1 of 2

• Many commercial & military aircraft reached or exceeded original design life– USAF aircraft 20-35+ years old– KC-135 (40 yrs. old) extended for 25 years– Civilian airlines Boeing 727 family – Introduced in

60’s

• Corrosion is a serious threat especially to older aircraft– Significant increase in maintenance costs– Increasing concern about structural integrity

Page 4: Complexity Challenges to the Discovery of Relationships in ...helper.ipam.ucla.edu/publications/sdm2002/sdm2002_4008.pdf · Complexity Challenges in Eddy Current NDT Data Approach

4Complexity Challenges in Eddy Current NDT Data

BackgroundAloha Flight 243

2 of 2

Page 5: Complexity Challenges to the Discovery of Relationships in ...helper.ipam.ucla.edu/publications/sdm2002/sdm2002_4008.pdf · Complexity Challenges in Eddy Current NDT Data Approach

5Complexity Challenges in Eddy Current NDT Data

Relationship Hypothesis1 of 2

• Relationship between calibration specimen results & classifying corrosion on KC-135 parts

• Current artificial corrosion processes show similar characteristics as those found in naturally corroded lap joints

Page 6: Complexity Challenges to the Discovery of Relationships in ...helper.ipam.ucla.edu/publications/sdm2002/sdm2002_4008.pdf · Complexity Challenges in Eddy Current NDT Data Approach

6Complexity Challenges in Eddy Current NDT Data

Relationship Hypothesis2 of 2

Natural Corrosion(KC-135 Specimen)

Artificial Corrosion(Calibration Specimen)

Page 7: Complexity Challenges to the Discovery of Relationships in ...helper.ipam.ucla.edu/publications/sdm2002/sdm2002_4008.pdf · Complexity Challenges in Eddy Current NDT Data Approach

7Complexity Challenges in Eddy Current NDT Data

Eddy Current Non-destructive Tests

• Problems with current visual representation– Requires considerable expertise to create and

interpret– Need for visual clarity leads to data generalization,

averaging, or overlooked points ~ accuracy?– Missed corrosion may cause a catastrophic accident

Page 8: Complexity Challenges to the Discovery of Relationships in ...helper.ipam.ucla.edu/publications/sdm2002/sdm2002_4008.pdf · Complexity Challenges in Eddy Current NDT Data Approach

8Complexity Challenges in Eddy Current NDT Data

ApproachOutline

• Data acquisition• Data transformation & consistency• Model development & feature selection• Model training & testing • Model Evaluation• Model Selection

Page 9: Complexity Challenges to the Discovery of Relationships in ...helper.ipam.ucla.edu/publications/sdm2002/sdm2002_4008.pdf · Complexity Challenges in Eddy Current NDT Data Approach

9Complexity Challenges in Eddy Current NDT Data

ApproachGraphic

DatatransformationData acquisition Consistent

Data

No

Datamining

Modelselection

Model development

Model testing& evaluation

Iterate

YesFeatureselection

Variabletransform

DatatransformationData acquisition Consistent

Data

No

Datamining

Modelselection

Model development

Model testing& evaluation

Iterate

YesFeatureselectionFeatureselection

Variabletransform

Page 10: Complexity Challenges to the Discovery of Relationships in ...helper.ipam.ucla.edu/publications/sdm2002/sdm2002_4008.pdf · Complexity Challenges in Eddy Current NDT Data Approach

10Complexity Challenges in Eddy Current NDT Data

ApproachData Acquisition

• Institute of Aerospace Research

– Calibration specimens– Retired KC-135 specimens

• Data

– Induced Voltage measurements from multi-frequency scans– Calibration specimen E1

n

m

1

Scan Direction

nm

n

m

1

Scan DirectionScan Direction

nm

Page 11: Complexity Challenges to the Discovery of Relationships in ...helper.ipam.ucla.edu/publications/sdm2002/sdm2002_4008.pdf · Complexity Challenges in Eddy Current NDT Data Approach

11Complexity Challenges in Eddy Current NDT Data

ApproachData Transformation & Consistency

1 of 5

Eddy Current Specimen E1 data

– 4 different scan frequencies are the 4 predictor variables (5.5 kHz, 8 kHz, 17 kHz, & 30 kHz)

– Merged 4 scan frequency files into one file

Page 12: Complexity Challenges to the Discovery of Relationships in ...helper.ipam.ucla.edu/publications/sdm2002/sdm2002_4008.pdf · Complexity Challenges in Eddy Current NDT Data Approach

12Complexity Challenges in Eddy Current NDT Data

ApproachData Transformation & Consistency

2 of 5

5.5 kHz 8 kHz

17 kHz 30 kHz

Page 13: Complexity Challenges to the Discovery of Relationships in ...helper.ipam.ucla.edu/publications/sdm2002/sdm2002_4008.pdf · Complexity Challenges in Eddy Current NDT Data Approach

13Complexity Challenges in Eddy Current NDT Data

ApproachData Transformation & Consistency

3 of 5

Image20

Image60

Finagle

Image25

Results from PicView Program

Starred areas show whichpicture is used to model specific loss area.

Page 14: Complexity Challenges to the Discovery of Relationships in ...helper.ipam.ucla.edu/publications/sdm2002/sdm2002_4008.pdf · Complexity Challenges in Eddy Current NDT Data Approach

14Complexity Challenges in Eddy Current NDT Data

ApproachData Transformation & Consistency

4 of 5

Specimen E1: EC Bitmap ≠ data set

0%5%7.5%10%

12.5%

15%

17.5%

40% 35% 30%

45% 50% 27.5%

22.5%20% 25%

Specimen E1: EC data set format

10%12.5%15%17.5%

20%

22.5%

25%

45% 40% 7.5%

50% 35% 5%

30%27.5% 0%

Page 15: Complexity Challenges to the Discovery of Relationships in ...helper.ipam.ucla.edu/publications/sdm2002/sdm2002_4008.pdf · Complexity Challenges in Eddy Current NDT Data Approach

15Complexity Challenges in Eddy Current NDT Data

ApproachData Transformation & Consistency

5 of 5Eddy Current Sensitivity to Milled Thickness Loss

-0.5

0

0.5

1

1.5

2

2.5

3

3.5

4

0 5 10 15 20 25 30 35 40 45 50

percent material removed

5.5 kHz

8 kHz

17 kHz

30 kHz

Specimen E1, Eddy Current Scan Data Mapping Validation

-0.50

0.00

0.50

1.00

1.50

2.00

2.50

3.00

3.50

0 5 10 15 20 25 30 35 40 45 50

percent material loss (%)av

erag

e vo

ltage

resp

onse

(V)

5.5 kHz (k5)8 kHz (k8)17 kHz (k17)30 kHz (k30)

Graph resulting fromdata transformationGraph from original study

Page 16: Complexity Challenges to the Discovery of Relationships in ...helper.ipam.ucla.edu/publications/sdm2002/sdm2002_4008.pdf · Complexity Challenges in Eddy Current NDT Data Approach

16Complexity Challenges in Eddy Current NDT Data

ApproachModel Development & Feature Selection

• Eddy Current– Four predictors and one response variable– Looked at histograms of variables to categorize

the observation’s distribution – Used scaling and transformations of predictors

• Feature Selection (E.G., regression)– Stepwise, Forward, Backward selection– Maximum R2

adj, Mallows CP

Page 17: Complexity Challenges to the Discovery of Relationships in ...helper.ipam.ucla.edu/publications/sdm2002/sdm2002_4008.pdf · Complexity Challenges in Eddy Current NDT Data Approach

17Complexity Challenges in Eddy Current NDT Data

ApproachModel Training & Testing

• Calibration specimen data used for training ~ Eddy Current specimen E1

• Training and Test data configuration (in general)

– 75% training (120,456 observations)– 25% test (40,152 observations)

Page 18: Complexity Challenges to the Discovery of Relationships in ...helper.ipam.ucla.edu/publications/sdm2002/sdm2002_4008.pdf · Complexity Challenges in Eddy Current NDT Data Approach

18Complexity Challenges in Eddy Current NDT Data

ApproachModel Training Evaluation

• Akaike Information Criterion• Schwartz Criterion• Coefficient of multiple determination (R2)• Adjusted R2

• Mallows Cp• Mean Absolute Error• Mean Squared Error

Page 19: Complexity Challenges to the Discovery of Relationships in ...helper.ipam.ucla.edu/publications/sdm2002/sdm2002_4008.pdf · Complexity Challenges in Eddy Current NDT Data Approach

19Complexity Challenges in Eddy Current NDT Data

ApproachModel Selection

Selection of best modeling methodology based on root mean squared error calculation on test set

( )2

1

1 ∑=

−=N

jjSETTEST yy

NMSE

Page 20: Complexity Challenges to the Discovery of Relationships in ...helper.ipam.ucla.edu/publications/sdm2002/sdm2002_4008.pdf · Complexity Challenges in Eddy Current NDT Data Approach

20Complexity Challenges in Eddy Current NDT Data

Algorithms1 of 4

Multiple Regression

Considered polynomial, interaction, and transformed terms

ipipiii XXXY εββββ +++++= −− 1,12,21,10 ...

Page 21: Complexity Challenges to the Discovery of Relationships in ...helper.ipam.ucla.edu/publications/sdm2002/sdm2002_4008.pdf · Complexity Challenges in Eddy Current NDT Data Approach

21Complexity Challenges in Eddy Current NDT Data

Algorithms2 of 4

Regression TreesLeast Squares example

Y Mean Value = 15Std dev = 0.425

Y Mean Value = 45Std dev = 0.334

Y Mean Value = 35Std dev = 0.401

Y Mean Value = 22.5Std dev = 0.396

Y Mean Value = 7.5Std dev = 0.297

Y Mean Value = 5Std dev = 0.185

Y Mean Value = 15Std dev = 0.501

Is k5 ≤ 1.16

Y Mean Value = 12.5Std dev = 0.446

Y Mean Value = 30Std dev = 0.420

YES NO

Is k8 ≤ 0.85 Is k17 ≥ 0.42YES NO YES NO

Is k30 ≤ 0.25YES NO

Y Mean Value = 15Std dev = 0.425

Y Mean Value = 45Std dev = 0.334

Y Mean Value = 35Std dev = 0.401

Y Mean Value = 22.5Std dev = 0.396

Y Mean Value = 7.5Std dev = 0.297

Y Mean Value = 5Std dev = 0.185

Y Mean Value = 15Std dev = 0.501

Is k5 ≤ 1.16

Y Mean Value = 12.5Std dev = 0.446

Y Mean Value = 30Std dev = 0.420

YES NO

Is k8 ≤ 0.85 Is k17 ≥ 0.42YES NO YES NO

Is k30 ≤ 0.25YES NO

Page 22: Complexity Challenges to the Discovery of Relationships in ...helper.ipam.ucla.edu/publications/sdm2002/sdm2002_4008.pdf · Complexity Challenges in Eddy Current NDT Data Approach

22Complexity Challenges in Eddy Current NDT Data

Algorithms3 of 4

Polynomial Networks

Doublet

Input A

Input B

Input C

Input D

Triplet

Single

Doublet

Normalizers

1st Layer

2d Layer

3d Layer

Unitizers

Output

Doublet

Input A

Input B

Input C

Input D

Triplet

Single

Doublet

Normalizers

1st Layer

2d Layer

3d Layer

Unitizers

Output

Page 23: Complexity Challenges to the Discovery of Relationships in ...helper.ipam.ucla.edu/publications/sdm2002/sdm2002_4008.pdf · Complexity Challenges in Eddy Current NDT Data Approach

23Complexity Challenges in Eddy Current NDT Data

Algorithms4 of 4

Ordinal Logistic Regression

1

0

P(Y≤ j)

P(Y≤ 2)

P(Y≤ 3)P(Y≤ 1)

Predictor Value(s)

1

0

P(Y≤ j)

P(Y≤ 2)

P(Y≤ 3)P(Y≤ 1)

1

0

P(Y≤ j)

P(Y≤ 2)

P(Y≤ 3)P(Y≤ 1)

Predictor Value(s)

Page 24: Complexity Challenges to the Discovery of Relationships in ...helper.ipam.ucla.edu/publications/sdm2002/sdm2002_4008.pdf · Complexity Challenges in Eddy Current NDT Data Approach

24Complexity Challenges in Eddy Current NDT Data

Results Multiple Regression

1 of 7

• The more complex the model, the better it did with both training and test datasets

• Best model incorporated transformed 4th

order polynomial and interaction terms

• Problem ~ Heteroscedasticity(non-constant variance)

Page 25: Complexity Challenges to the Discovery of Relationships in ...helper.ipam.ucla.edu/publications/sdm2002/sdm2002_4008.pdf · Complexity Challenges in Eddy Current NDT Data Approach

25Complexity Challenges in Eddy Current NDT Data

Results Regression Trees

2 of 7

• Program limitations for data size – 60,000 observation training dataset– 60,000 observation test dataset

• Least squares tree tested 2611 trees

• Least absolute deviation tested 172 trees

Page 26: Complexity Challenges to the Discovery of Relationships in ...helper.ipam.ucla.edu/publications/sdm2002/sdm2002_4008.pdf · Complexity Challenges in Eddy Current NDT Data Approach

26Complexity Challenges in Eddy Current NDT Data

Results Regression Trees

3 of 7

• Least absolute deviation regression tree was the best

– Fewer nodes 819 vs. 1857– Smaller Complexity value: –1.0 vs 37.6– Smaller Root MSE for test set

Page 27: Complexity Challenges to the Discovery of Relationships in ...helper.ipam.ucla.edu/publications/sdm2002/sdm2002_4008.pdf · Complexity Challenges in Eddy Current NDT Data Approach

27Complexity Challenges in Eddy Current NDT Data

Results Regression Trees

4 of 7

Least Squares Regression Tree

Least Absolute Deviation Regression Tree

Page 28: Complexity Challenges to the Discovery of Relationships in ...helper.ipam.ucla.edu/publications/sdm2002/sdm2002_4008.pdf · Complexity Challenges in Eddy Current NDT Data Approach

28Complexity Challenges in Eddy Current NDT Data

Results Polynomial Networks

5 of 7

Page 29: Complexity Challenges to the Discovery of Relationships in ...helper.ipam.ucla.edu/publications/sdm2002/sdm2002_4008.pdf · Complexity Challenges in Eddy Current NDT Data Approach

29Complexity Challenges in Eddy Current NDT Data

Results Ordinal Logistic Regression

6 of 7

• The more complex the model, the better it did with both training and test datasets

• Best model incorporated transformed 4th

order polynomial and interaction terms

Page 30: Complexity Challenges to the Discovery of Relationships in ...helper.ipam.ucla.edu/publications/sdm2002/sdm2002_4008.pdf · Complexity Challenges in Eddy Current NDT Data Approach

30Complexity Challenges in Eddy Current NDT Data

Results Model Selection

7 of 7

RT MSE VAR5.388 29.0305.610 31.4684.872 23.7390.566 0.320LAD Regression Tree

ModelOverall Model Comparision by Test Set

Multiple Regression Model 8Logistic Regression Model 8

Polynomial Network

Page 31: Complexity Challenges to the Discovery of Relationships in ...helper.ipam.ucla.edu/publications/sdm2002/sdm2002_4008.pdf · Complexity Challenges in Eddy Current NDT Data Approach

31Complexity Challenges in Eddy Current NDT Data

Interpretation of ResultsMore Complex = Better Model

1 of 3

x1

x2

Stitching Effect

Y=0 Y=1 Y=2 Y=4

x1

x2

Stitching Effect

Y=0 Y=1 Y=2 Y=4

Dataset requirescomplex parametric models

Or

Non-parametric models

Page 32: Complexity Challenges to the Discovery of Relationships in ...helper.ipam.ucla.edu/publications/sdm2002/sdm2002_4008.pdf · Complexity Challenges in Eddy Current NDT Data Approach

32Complexity Challenges in Eddy Current NDT Data

Interpretation of ResultsMore Complex = Better Model

2 of 3

STITCHINGEFFECT

Page 33: Complexity Challenges to the Discovery of Relationships in ...helper.ipam.ucla.edu/publications/sdm2002/sdm2002_4008.pdf · Complexity Challenges in Eddy Current NDT Data Approach

33Complexity Challenges in Eddy Current NDT Data

Interpretation of ResultsWhy LAD RT does well

3 of 3

LAD regression tree does well because:

– Robust in presence of heteroscedasticity

– Partitioning provides for improved accuracy

– Uses “stitching” to capture nonlinearities

Page 34: Complexity Challenges to the Discovery of Relationships in ...helper.ipam.ucla.edu/publications/sdm2002/sdm2002_4008.pdf · Complexity Challenges in Eddy Current NDT Data Approach

34Complexity Challenges in Eddy Current NDT Data

Conclusions1 of 2

Maintenance Operations

– Showed that an algorithm can be developed to assist operators in maintenance decisions

– Showed how to transform and clean the eddy current data for analysis

– Provided a basis for choosing among competing algorithms for actual implementation

Page 35: Complexity Challenges to the Discovery of Relationships in ...helper.ipam.ucla.edu/publications/sdm2002/sdm2002_4008.pdf · Complexity Challenges in Eddy Current NDT Data Approach

35Complexity Challenges in Eddy Current NDT Data

Conclusions2 of 2

Methodological

– Provided a formal approach for comparing different data mining techniques on real corrosion data

– Showed that real data sets can produce highly complex relationships (contrast with Ockham’srazor) and that models can be found to handle these complexities

– Demonstrated the power of tree-based methods to treat nonlinearities in the data through “stitching”, which was formerly thought to be a disadvantage

Page 36: Complexity Challenges to the Discovery of Relationships in ...helper.ipam.ucla.edu/publications/sdm2002/sdm2002_4008.pdf · Complexity Challenges in Eddy Current NDT Data Approach

36Complexity Challenges in Eddy Current NDT Data

Future Research• Classification algorithms

• If time-lapse available ~ time series analysis

• Spatial models (correlation between corrosion areas)

• Other non-parametric techniques

• Application of a “known” naturally corroded specimen as test dataset

Page 37: Complexity Challenges to the Discovery of Relationships in ...helper.ipam.ucla.edu/publications/sdm2002/sdm2002_4008.pdf · Complexity Challenges in Eddy Current NDT Data Approach

Questions ?

NASA’s “Vomit Comet”