advanced regression with jmp pro handout...coefficients and model selection and with this in model...

46
Advanced Regression with JMP PRO German JMP User Meeting Holzminden – June 22, 2017 Silvio Miccio

Upload: others

Post on 29-Aug-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Advanced Regression with JMP PRO Handout...coefficients and model selection and with this in model validity • Hence, Advanced Regression Methods, like available in JMP PRO, have

Advanced RegressionwithJMP PRO

German JMP User Meeting

Holzminden – June 22, 2017

Silvio Miccio

Page 2: Advanced Regression with JMP PRO Handout...coefficients and model selection and with this in model validity • Hence, Advanced Regression Methods, like available in JMP PRO, have

Overview

• Introduction

• Some Details on Parameter Estimation and Model Selection

• Generalized Linear Models

• Penalized Regression Models in JMP PRO

• Example:

• Analysis of Time to Event Data (Parametric Survival Models)

• Classification Model with Missing Informative Data

• Linear Mixed Models in JMP PRO

• Example:

• Nested Intercept

• Repeated Measure (Consumer Research)

Page 3: Advanced Regression with JMP PRO Handout...coefficients and model selection and with this in model validity • Hence, Advanced Regression Methods, like available in JMP PRO, have

Introduction

• Multiple Linear Regression (MLR) is one of the most commonly used methods in Empirical Modeling

• MLR is high efficient as long as all assumptions are met

• Especially observational data often do not meet the assumptions, resulting in problems with estimation of coefficients and model selection and with this in model validity

• Hence, Advanced Regression Methods, like available in JMP PRO, have to applied to benefit from the ease of interpretation of regression methods

Page 4: Advanced Regression with JMP PRO Handout...coefficients and model selection and with this in model validity • Hence, Advanced Regression Methods, like available in JMP PRO, have

Linear Regression

• yi = Response; i = 1, 2….n (n = number of observations)

• xji = Factor, Predictor; j = 1, 2….p (p = number of factors)

• β0 = Intercept

• βj = Coefficients

• εi = Error

� =� + ���� + ⋯ ����+�� =� + ���� + ���� +������� + ��� ��� +������ +�� =� + ��

����

+ ������+�

Page 5: Advanced Regression with JMP PRO Handout...coefficients and model selection and with this in model validity • Hence, Advanced Regression Methods, like available in JMP PRO, have

Assumptions

1. Errors are normally distributed �~�(0, ��)2. Errors are independent

E.g. no pattern in residuals over time or groups

3. HomoscedasticityVariance is constant in the entire model space

4. Factors are not or only slightly correlatedRule of thumb VIF < 3 or 5

5. Predictors are fixed factors, measured with almost “no” error. Error is assumed to be completely on the side of the residuals

6. Response is a linear combination of coefficients and factors

Page 6: Advanced Regression with JMP PRO Handout...coefficients and model selection and with this in model validity • Hence, Advanced Regression Methods, like available in JMP PRO, have
Page 7: Advanced Regression with JMP PRO Handout...coefficients and model selection and with this in model validity • Hence, Advanced Regression Methods, like available in JMP PRO, have

Some Details on Linear RegressionParameter Estimation

Page 8: Advanced Regression with JMP PRO Handout...coefficients and model selection and with this in model validity • Hence, Advanced Regression Methods, like available in JMP PRO, have

Linear Regression

y = Xβ +ε

� =� + ���� + ⋯ ����+�

� = ����⋮�!

" =

11...1

������...�!�

������...�!�

…….

.

.…

������.

.

.�!�

� = � ��⋮

��

& = &���⋮�!

For generalization of the parametric model of a linear regression

it makes sense to change to matrix notation:

with

n × 1 vector of responsesn × p matrix of the

factors/variables

p × 1 vector of

unknown constants n × 1 vector of random errors N (0,σ2)

Page 9: Advanced Regression with JMP PRO Handout...coefficients and model selection and with this in model validity • Hence, Advanced Regression Methods, like available in JMP PRO, have

Standard Least Square Estimate

L=&’&L=(y- Xβ)‘(y- Xβ) (AB)‘=B‘A‘L=y’y- β‘X‘y - y‘X β +β‘X‘X β β‘X‘y =y‘X βL=y’y- 2β‘X‘y +β‘X‘X β Quadratic Function9:9� = −2X<y + 2X<Xβ 9:

9� = 0X’Xβ=X‘yβ =(X‘X)-1X‘y

: = = >�!

?�

Page 10: Advanced Regression with JMP PRO Handout...coefficients and model selection and with this in model validity • Hence, Advanced Regression Methods, like available in JMP PRO, have

X Matrix (coded) – 23 FF Design 3 Center Points

• X matrix of a full factorial 23 design with three center points

• 1st column intercept

• 2nd – 4th column main effects

• 5th – 7th column interactions

X =

1 -1 -1 -1 1 1 1

1 1 -1 -1 -1 -1 1

1 -1 1 -1 -1 1 -1

1 1 1 -1 1 -1 -1

1 -1 -1 1 1 -1 -1

1 1 -1 1 -1 1 -1

1 -1 1 1 -1 -1 1

1 1 1 1 1 1 1

1 0 0 0 0 0 0

1 0 0 0 0 0 0

1 0 0 0 0 0 0

Int X1 X2 X3 X1X2 X1X3 X2X3

Page 11: Advanced Regression with JMP PRO Handout...coefficients and model selection and with this in model validity • Hence, Advanced Regression Methods, like available in JMP PRO, have

X’X (Covariance Matrix)

Page 12: Advanced Regression with JMP PRO Handout...coefficients and model selection and with this in model validity • Hence, Advanced Regression Methods, like available in JMP PRO, have

(X’X)-1 Inverted Covariance Matrix

• The “degrees of freedom“ for estimating the model coefficients show up at the diagonal

• This is only true if all off diagonal elements are 0 (all factors are independent of each other)

• When off diagonal elements are not zero then the factors are correlated

1/11 0 0 0 0 0 0

0 1/8 0 0 0 0 0

0 0 1/8 0 0 0 0

0 0 0 1/8 0 0 0

0 0 0 0 1/8 0 0

0 0 0 0 0 1/8 0

0 0 0 0 0 0 1/8

Page 13: Advanced Regression with JMP PRO Handout...coefficients and model selection and with this in model validity • Hence, Advanced Regression Methods, like available in JMP PRO, have

Ort

ho

go

na

l

Mu

ltic

oll

ine

ari

ty

Page 14: Advanced Regression with JMP PRO Handout...coefficients and model selection and with this in model validity • Hence, Advanced Regression Methods, like available in JMP PRO, have

Effects of Multi-Co-linearity

• Singular matrix no solution

• High variance in the coefficients

• High variance in the predictions

• Often high R-square, but (all) factors are insignificant

• Small changes in the data may have a big effect on the coefficients (not robust)

• Best subset selection i.e. via Stepwise Regression may become almost impossible

Page 15: Advanced Regression with JMP PRO Handout...coefficients and model selection and with this in model validity • Hence, Advanced Regression Methods, like available in JMP PRO, have

Some Details on Linear RegressionModel Selection

Page 16: Advanced Regression with JMP PRO Handout...coefficients and model selection and with this in model validity • Hence, Advanced Regression Methods, like available in JMP PRO, have

• Overall goal in Empirical Modeling is to identify the model with the lowest expected prediction error

Expected Prediction Error =

Irreducible error (inherent noise of the system) +

Squared Bias (depends on model selection) +

Variance (depends on model selection)

• This requires to find the model with optimum complexity (e.g. number of factors, number of sub-models, functional form of model terms, modeling method)

• Model Selection: “estimating the performance of different models in order to choose the (approximate) best one”

Model Selection

Page 17: Advanced Regression with JMP PRO Handout...coefficients and model selection and with this in model validity • Hence, Advanced Regression Methods, like available in JMP PRO, have

• If model complexity is too low the model is biased (important features of the system not captured by the model)

• If model complexity is too high the model is fit too hard to the data, which results in a poor generalization of the prediction (high prediction variance)

• The challenge is to identify the model with the optimum trade-off between bias and variance

Bias-Variance Trade Off

• Training error: variation in the data not explained by the model

• Test error: expected prediction error based on independent test data

Page 18: Advanced Regression with JMP PRO Handout...coefficients and model selection and with this in model validity • Hence, Advanced Regression Methods, like available in JMP PRO, have

• When it is not possible to split the data into a training, validation and test set (this is the case for designed experiments or for small data sets) model selection can be done by via measures that try to approximating the validation/test error, like AIC, AICc and BIC

• Here the estimated value usually is not of direct interest, it is the relative size that matters

• Alternative methods based on re-sampling (e.g. cross validation) provide direct estimate of the expected prediction error (can be used for model assessment)

Methods for Model Selection

Page 19: Advanced Regression with JMP PRO Handout...coefficients and model selection and with this in model validity • Hence, Advanced Regression Methods, like available in JMP PRO, have

Generalized Linear ModelsModeling discrete responses and non-normal distributed errors

Page 20: Advanced Regression with JMP PRO Handout...coefficients and model selection and with this in model validity • Hence, Advanced Regression Methods, like available in JMP PRO, have

Generalized Linear Model (GLM)

• A GLM is a generalization of a linear model for non-normal responses where errors being a function of the meanoBinomial - dichotomous data (yes/no, pass/fail)

oPoisson - count data

o Log Normal - data restricted to non-negative values (transformed data normally distributed)

o and much more…

• Components of GLM

1. Random Component

2. Systematic Component

3. Link Function

Page 21: Advanced Regression with JMP PRO Handout...coefficients and model selection and with this in model validity • Hence, Advanced Regression Methods, like available in JMP PRO, have

Random Component

Identifies the distribution and variance of the response. Usually derives from the exponential family of distributions, but not restricted to it.

The parameter θi and φ are location and scale parameter

Page 22: Advanced Regression with JMP PRO Handout...coefficients and model selection and with this in model validity • Hence, Advanced Regression Methods, like available in JMP PRO, have

Systematic Component & Link Function

A =� + ���� + ⋯ ����+�

Systematic ComponentLinear function of the factors, the so called Linear Predictor where the predictors can be transformed (squared, log…)

Link FunctionSpecifies the link between random and systematic components. It is an invertible function that defines the relation between the response and the linear predictor

A = B �� = BC�A

Page 23: Advanced Regression with JMP PRO Handout...coefficients and model selection and with this in model validity • Hence, Advanced Regression Methods, like available in JMP PRO, have

Common Variance and Link Functions

Page 24: Advanced Regression with JMP PRO Handout...coefficients and model selection and with this in model validity • Hence, Advanced Regression Methods, like available in JMP PRO, have

Comparison Standard Least Squares vs. GLM

Standard Least Square Regression

� = "� + &� = "<" C�"<�

y is an n × 1 vector of responses

X is an n × p matrix of the factors variables

β is a p × 1 vector of unknown constants

ε is an n × 1 vector of random errors N (0,σ2)

X′ is the transpose of X

X X′ is a p × p matrix of correlations between the factors

η is the linear predictor

g-1 is the inverse link function

W is a diagonal matrix of weights wi

z is a response vector with entries zi

Iteratively Re-Weighted Least Squares

� = BC�AA = "� + &� = "<E" C�"<Ez

Page 25: Advanced Regression with JMP PRO Handout...coefficients and model selection and with this in model validity • Hence, Advanced Regression Methods, like available in JMP PRO, have

Generalized Linear RegressionPenalized Regression

Page 26: Advanced Regression with JMP PRO Handout...coefficients and model selection and with this in model validity • Hence, Advanced Regression Methods, like available in JMP PRO, have

Generalized Linear Regression (GLR)

GLR can be seen as extension of GLM, in addition being able to deal with:

• Multicollienearity and to perform• Model Selection (p > n/2 as well as for p > n)

This is achieved by penalized regression methods, which attempt to fit better models by shrinking the parameter estimates

Although shrunken estimates are biased, the resulting models are usually better i.e. having a lower prediction error

Page 27: Advanced Regression with JMP PRO Handout...coefficients and model selection and with this in model validity • Hence, Advanced Regression Methods, like available in JMP PRO, have

Ridge Regression

• Ridge Regression was developed as remedy for multicollinearity

• It attempts to minimize the penalized residual sum of squares

• λ ≥ 0 is the regularization parameter controlling the amount of shrinkage

• is called the L2 penalty, due to the squared term

�GHIJ = KLBMN� = � − � − = �O�O�

O?�

+ P = �O��

O?�

!

?�

P = �O��

O?�

Page 28: Advanced Regression with JMP PRO Handout...coefficients and model selection and with this in model validity • Hence, Advanced Regression Methods, like available in JMP PRO, have

Ridge Regression

• Parameters are estimated according to:

where IIII is a p x p diagonal identity matrix (diagonal elements are 1 and all off diagonal elements are 0

• When there is a multicollinearity problem the off diagonal elements (covariances) of (X’X)(X’X)(X’X)(X’X) are large compared to the values of the diagonal (variances)

• By adding λIλIλIλI to the covariance matrix, the diagonal elements increase

�GHIJ = S<S + TU C�S<V

Page 29: Advanced Regression with JMP PRO Handout...coefficients and model selection and with this in model validity • Hence, Advanced Regression Methods, like available in JMP PRO, have

What is Ridge Regression Doing?

• Since ββββ =((((X’X+λIX’X+λIX’X+λIX’X+λI))))----1111X’yX’yX’yX’y, the diagonal elements of the inverted matrix are getting smaller, which means the parameter estimates for ββββ are shrunken

• As λλλλ gets larger the inverse is getting smaller meaning the variance in ββββ decreases (what is desired), but only to a certain point

• When λλλλ is getting too big the residual sum of squares increase, because the coefficients are shrunken so much that they do not explain the response anymore (bias)

• Hence there is an optimum for λλλλ

Page 30: Advanced Regression with JMP PRO Handout...coefficients and model selection and with this in model validity • Hence, Advanced Regression Methods, like available in JMP PRO, have

LASSO Regression

• Since Ridge Regression is shrinking large coefficients, which are potentially important, more than small coefficients the LASSO was developed

• LASSO shrinks all coefficients by the same amount, but in addition shrinks “unimportant” factors to exactly zero, which means they are removed from the model (→ Model SelecQon)

• Like Ridge Regression, LASSO minimizes the error sum of squares, but using a different penalty

�WXYYZ = KLBMN� = � − � − = �O�O�

O?�

+ P = �O�

O?�

!

?�

Page 31: Advanced Regression with JMP PRO Handout...coefficients and model selection and with this in model validity • Hence, Advanced Regression Methods, like available in JMP PRO, have

LASSO Regression

• is called the L1 penalty, due to the “first power” in the penalty term

• Parameter estimation for the LASSO is algorithmic, because there is no closed from solution (penalty is an absolute value, cannot be differentiated)

• The Lasso is designed for model selection, but is not doing that good with collinearity

• Ridge is designed for multicollinearity, but is doing no model selection

• Hence, a combination of both

• methods would be desired

P = �O�

O?�

Page 32: Advanced Regression with JMP PRO Handout...coefficients and model selection and with this in model validity • Hence, Advanced Regression Methods, like available in JMP PRO, have

Elastic Nets

• Elastic Nets combine the L1 and the L2 penalty

• The L1 penalty controls the model selection

• The L2 penalty

o Enables p > n

oKeeps groups of highly correlated variables in the model (LASSO just picks one variable from the group)

o Improves, smoothes parameter estimation

�[\ = KLBMN� = � − � − = �O�O�

O?�

+ P� = �O��

O?�+ P� = �O

O?�

!

?�

Page 33: Advanced Regression with JMP PRO Handout...coefficients and model selection and with this in model validity • Hence, Advanced Regression Methods, like available in JMP PRO, have

Adaptive Methods

• Adaptive methods penalize “important” factors less than “unimportant” by a weighted penalty

• �O~ is the maximum likelihood estimate if existing, for normal

distributed data the least square estimate or for non normal distributed data the ridge solution

• Adaptive models attempt to ensure Oracle Properties

�Identification of true active factors

�Correct estimation of parameter estimates

AdaptiveL�Penalty = P� = �O�O~

O?�AdaptiveL�Penalty = P� = �O

�O~��

O?�

Page 34: Advanced Regression with JMP PRO Handout...coefficients and model selection and with this in model validity • Hence, Advanced Regression Methods, like available in JMP PRO, have

Tuning Parameters (from JMP Help)

• LASSO and Ridge are determined by one tuning parameter (L1 or L2)

• Elastic Nets are determined by two tuning parameters (L1 and L2), where the Elastic Net Alpha is the weight between the penalties

• The higher the tuning parameter, the higher the penalty (adding a zero provides the Maximum Likelihood solution (MLE); no penalty)

oWhen tuning parameter is too small the model is likely to overfit

oWhen tuning parameter is too big there is bias in the model

• To obtain a solution the tuning parameter is increased over a fine grid

• Optimum solution is where best fit over the entire tuning parameter grid is achieved

Page 35: Advanced Regression with JMP PRO Handout...coefficients and model selection and with this in model validity • Hence, Advanced Regression Methods, like available in JMP PRO, have

Tuning Parameters – Elastic Net Alphafrom JMP Help)

• Determines the mix between the L1 and L2 penalty

• Default value is 0.9 meaning (coefficient on L1 penalty is set to 0.9, coefficient on L2 penalty is to 0.1

• If Elastic Net Alpha is not set, the algorithm computes the Lasso, Elastic Net, and Ridge fits, in that order and keeps the “best” solution

Page 36: Advanced Regression with JMP PRO Handout...coefficients and model selection and with this in model validity • Hence, Advanced Regression Methods, like available in JMP PRO, have

Model Tuning

• Try different Estimation Methods, settings for Advanced Controls and Validation Methods to find best model

• All models are displayed in the model report and can be individually saved as script and prediction formula

• Note: k-fold or random holdback validation is not recommended for DOE data

Page 37: Advanced Regression with JMP PRO Handout...coefficients and model selection and with this in model validity • Hence, Advanced Regression Methods, like available in JMP PRO, have

Data Set 1 - Parametric Survival Analysis

• 4 Factors (E = Equipment Set-Up, P = Process Setting, F1 & F2 Product Formulation) have been investigated in a designed experiment

• Column Censor includes the censoring variable (0 = no censoring, 1 = censoring)

• Response is the time the sample resists a force applied to it

• For feasibility reasons the measurement is stopped after a pre-defined maximum test time. This leads to so called “right censoring”, because not all samples fail within the maximum test time.

• Objective is to create a model for predicting the survival time of the sample

• The data file “GLR Survival” contains the scripts for the parametric survival model for JMP (does not allow for automated model selection) and JMP PRO

Page 38: Advanced Regression with JMP PRO Handout...coefficients and model selection and with this in model validity • Hence, Advanced Regression Methods, like available in JMP PRO, have

Data 2 - Credit Risk Scoring• The data set is called Equity.jmp and taken from the JMP Sample Data Library located in the JMP Help

menu

• It is based on historical data gathered to determine whether a customer is a good or bad credit risk for a home equity loan (watch out: missing data, they are set to Informative Missing in JMP PRO, because they contain important information)

• Predictors:• LOAN = how much was the loan

• MORTDUE = how much they need to pay on their mortgage

• VALUE = assessed valuation

• REASON = reason for loan

• JOB = broad job category

• YOJ = years on the job

• DEROG = number of derogatory reports

• DELINQ = number of delinquent trade lines

• CLAGE = age of oldest trade line

• NINQ = number of recent credit enquiries

• CLNO = number to trade lines

• DEBTINC = dept to income ratio

• Response is Credit Risk, predict good and bad credit risks

• Data file “Credit Risk” contains scripts for JMP PRO (GLR for model selection, informative missing, validation column) and JMP (logistic regression, it is possible to do stepwise and manual informative missing coding – see JMP home page for details)

Page 39: Advanced Regression with JMP PRO Handout...coefficients and model selection and with this in model validity • Hence, Advanced Regression Methods, like available in JMP PRO, have

Linear Mixed ModelsG-Side and R-Side Random Effects

Page 40: Advanced Regression with JMP PRO Handout...coefficients and model selection and with this in model validity • Hence, Advanced Regression Methods, like available in JMP PRO, have

Fixed Factors

• Usually the factors e.g. in a design of experiment are varied within fixed factor levels

• With fixed factors we can make statistical inferences within the investigated model space, based on the factor effects

• When the factor levels are randomly chosen from a larger population of factor levels, the factor is said to be a random factor

Page 41: Advanced Regression with JMP PRO Handout...coefficients and model selection and with this in model validity • Hence, Advanced Regression Methods, like available in JMP PRO, have

Random Factors• Random factors allow to draw conclusions about the entire

population of possible factor levels

• The population of possible factor levels is considered to be infinite

• Random effects models are of special interest for identifying sources of variation, because they allow to identify variance components

• Random Factors

• Random Effects: Machines, Operators, Panelists

• Random Effects also have to be considered for split plot designs, correlated responses, spatial data, repeated measurements

Page 42: Advanced Regression with JMP PRO Handout...coefficients and model selection and with this in model validity • Hence, Advanced Regression Methods, like available in JMP PRO, have

Random Effects Model

� = "� + bc + &

c~� 0, d , &~� 0, e

f � = "�; h � = bdb< + e

• y is an vector of responses

• X is the regression matrix of the fixed effects

• β is a vector of unknown fixed effect

parameters

• Z is the regression matrix of the of the

random effects

• γis a vector of unknown random effects

parameters

• ε is a vector of random errors (not required

to be independent or homogenous)

• G is variance-covariance matrix for random

effects

• R is variance-covariance matrix for model

errors

• G-side effects are specified by the Z matrix

(random effects)

• R-side effects, are specified by the

covariance structure (repeated structure)

Page 43: Advanced Regression with JMP PRO Handout...coefficients and model selection and with this in model validity • Hence, Advanced Regression Methods, like available in JMP PRO, have

Repeated Covariance Structure Requirements (taken from JMP Help)

For details regarding different covariance structures, please see the JMP Help

Page 44: Advanced Regression with JMP PRO Handout...coefficients and model selection and with this in model validity • Hence, Advanced Regression Methods, like available in JMP PRO, have

Strategies for Selecting Covariance Structure

• It is “always” possible to fit an unstructured covariance structure, but this also means fitting the most complex model (potential risk of overfitting)

• Best option is to use covariance structure that are expected to make sense for the given context (see JMP for details regarding the structures)

• To find out the best covariance structure from competing models AICc and/or BIC can be used

• Check the structure of the residuals by plotting them or using the variogram

Page 45: Advanced Regression with JMP PRO Handout...coefficients and model selection and with this in model validity • Hence, Advanced Regression Methods, like available in JMP PRO, have

Repeated Measures (from JMP Help)

• Repeated measures designs, also known as within-subject designs,

model changes in a response over time or space while allowing errors to

be correlated.

• Spatial data are measurements made in two or more dimensions,

typically latitude and longitude. Spatial measurements are often

correlated as a function of their spatial proximity.

• Correlated response data result from making several measurements on

the same experimental unit. For example, height, weight, and blood

pressure readings taken on individuals in a medical study, or hardness,

strength, and elasticity measured on a manufactured item, are likely to

be correlated. Although these measurements can be studied individually,

treating them as correlated responses can lead to useful insights.

Page 46: Advanced Regression with JMP PRO Handout...coefficients and model selection and with this in model validity • Hence, Advanced Regression Methods, like available in JMP PRO, have

Correlated Response (Example from JMP Help)

In this example, the effect of two layouts dealing with wafer production is studied for a characteristic of interest. Each of 50 wafers is partitioned into four quadrants and the characteristic is measured on each of these quadrants. Data of this type are usually presented in a format where each row contains all of the repeated measurements for one of the units of interest. Data of this type are often analyzed using separate models for each response. However, when repeated measurements are taken on a single unit, it is likely that there is within-unit correlation. Failure to account for this correlation can result in poor decisions and predictions. You can use the Mixed Model personality to account for and model the possible correlation.

P&G example cannot be shared. Use JMP data file “Wafer Stacked” from the JMP sample data library (not attached). Can be analyzed in JMP PRO only; R-Side Random Effect with spatial structure.