a comparison of some robust regression techniques - · pdf filea comparison of some robust...

23
A COMPARISON OF SOME ROBUST A COMPARISON OF SOME ROBUST REGRESSION TECHNIQUES REGRESSION TECHNIQUES Ezgi AVCI TSE, Personnel and System Certification Center,TURKEY Gülser KÖKSAL METU, Industrial Engineering Department,TURKEY 54th EOQ Congress Izmir 26-27 October 2010

Upload: docong

Post on 12-Mar-2018

231 views

Category:

Documents


7 download

TRANSCRIPT

A COMPARISON OF SOME ROBUST A COMPARISON OF SOME ROBUST REGRESSION TECHNIQUESREGRESSION TECHNIQUES

Ezgi AVCI

TSE, Personnel and System Certification Center, TURKEY

Gülser KÖKSAL

METU, Industrial Engineering Department, TURKEY

54th EOQ Congress

Izmir

26-27 October 2010

Outline

� Definition and Purpose of Regression

� Output of the Regression Process

� Regression Process Flow Diagram

� Why alternative Regression Methods?

� Robustness� Robustness

� Outliers

� Robust Regression Methods

� A simulation Study

� A Case Study

� Conclusions

RegressionRegression

� Investigates and models the relationship between the variables

� Application areas:� Application areas:

o Engineering

o Physical sciences

o Life and Biological Sciences

o Social Sciences

Purpose of RegressionPurpose of Regression

� To create an “equation” or “transfer function” from the measurements of the system’s inputs and outputs acquired during a passive or active experiment.

� The transfer function is then used for � The transfer function is then used for

-sensitivity analysis

-optimization for system performance

-tolerancing the system’s components

RegressionRegression

� Industrial applications:

◦ Quality Control and Improvement

ex: ISO 9001-2008 Standard; 8. item: Measurement, Analysis and ImprovementAnalysis and Improvement

◦ Data Mining

Output of RegressionOutput of Regression

� An estimation of the relative strength of the effect of each factor on the response

� An equation that analytically relates the critical parameters to the critical responses

� An estimate of how much of the total variation seen in the data is explained by the equation

Regression Process Flow DiagramRegression Process Flow Diagram

Select the Select the inputs and

Select and Run the

system in the The system to be assessed. (INPUT)

Select the environment in which the data

will be collected.

inputs and outputs to be measured in a passive or active

experiment.

Select and qualify the

measurement systems used to acquire the

data.

system in the prescribed environment and acquire

the data as the inputs vary.

Regression Process Flow DiagramRegression Process Flow Diagram

Inspect the data for

Postulate and build a Test the

Test the predictive

The transfer function that data for

outliers and remove them if root cause justifies their removal.

build a functional relationship between the inputs and the

output.

Test the statistical

adequacy of the functional relationship.

predictive ability of the functional

relationship on the physical system.

function that analytically relates the inputs to the outputs.

(OUTPUT)

Why Alternative Regression Why Alternative Regression Methods?Methods?

� It is not easy to satisfy the assumptions

� Normality Assumption Violation� Normality Assumption Violation

� Outliers !

� Robust Regression

IgnoringIgnoring OutliersOutliers� The Challenger Accident:

Thiokol engineers argued that if the O-rings were colder than 53 °F (12 °C), they did not have enough data to determine whether the joint would seal properly.

The shuttle and external tank did not actually “explode”. Instead they rapidly disintegrated under tremendous aerodynamic forces, since the shuttle was slightly past “Max Q", or maximum aerodynamic pressure.

OutliersOutliers� Defn: The observation that appears to deviate markedly from

the other members of the sample in which it occurs.

Data Data withwith OutlierOutlier

Data Data withoutwithout OutlierOutlier

Two common waysTwo common ways toto detectdetectoutliersoutliers

1. Regression Diagnostics:

It is hard to detect the multiple outliers

2. Robust Regression:

It is easy to detect the outliers by their large residuals

WhatWhat toto do do withwith OutliersOutliers??

� Delete them ?

� Ignore them?

� Give less weight to them?

� Robust regression methods are a “smooth transition between full acceptance and full rejection of an observation”

� The best rejection procedures are not � The best rejection procedures are not competitive against the best robust procedures.

Robust Regression MethodsRobust Regression Methods

� Least Absolute Value (LAV)

� Huber –M method

� MM method

� Least Median Squares (LMS)� Least Median Squares (LMS)

� Least Trimmed Squares (LTS)

� Multivariate Adaptive Regression Splines (MARS)

� Local Weighted Scatter Plot Smoothing (LOESS)

A Simulation Study:A Simulation Study:

� Simulation has been a commonly used tool

to compare robust regression techniques.

� The seven robust regression methods are The seven robust regression methods are compared by some performance measures with respect to some scenarios.

� The results are discussed and the most promising robust methods are determined.

The The ResultsResults of the of the SimulationSimulation StudyStudy

� The most promising methods:

� OLS

� HUBER-M

� LAV� LAV

� LTS

� These methods are compared on an industrial data set.

The The LogicLogic of LADof LAD

� Least Absolute Deviation:

Description of the Data SetDescription of the Data Set� Our data is taken from a real life

manufacturing process which includes the sub-processes core, molding, melting, casting, fettling and painting.

� The dependent variable is the percentage of defectives on a percentage of defectives on a cylinder head.

� Missing values are eliminated by the proper methods.

� The basic data set includes 36 independent variables and 92 observations.

CONCLUSIONSCONCLUSIONS� For our real life data we see that there is no significant difference

between the robust methods and the classical OLS method.

� We have explained this situation by complexity of the data and irrelevant variables.

� Moreover, even if the results of the OLS and the robust regression methods are the same; the model fitted by OLS is not valid because it is not applied with normality assumption satisfied.

� As a result, robust methods are the safest way to deal with outliers even if their performances are same with the classical methods since they do not have such strict assumptions.

THANK YOU…THANK YOU…