draft – work in progress– confidential – do not distribute 1 sam weerahandi generalized...

Draft – Work in Progress– Confidential – Do Not Distribute1

Generalized Inference

with Application to Small Sample Situations

Sam WeerahandiSam Weerahandi

(Joint Work with Kawai, Yu et al., Mathew et al.)

2

Outline

Motivation: Why Generalize?

Problems with Classical & Bayesian Inferences

Introduction to Generalized inference

About Mixed Models

Mixed Models: An Overview

Issues with MLE based Inference

Application: BLUP in Mixed Models

Performance Comparison

An Application

3

Motivation: Why Generalize Classical Inference

STAT 200 teach how to make ANY inference! Really?

Classical Approach to Inference (tests, confidence intervals, etc) works fine with mean μ, variance σ2 of Normal distribution

But it fails (MLE based inferences are asymptotics) with most functions of the mean and variance, except for a few simple functions advanced models such as Mixed Models and ANOVA with unequal variances

Classical Approach also fails to give small sample inferences with non-normal distributions: Some functions of parameters of Uniform distribution, U(α,β), Scale parameter of Gamma distribution, parameters of Weibull

distribution, etc.

One can find various solutions in the literature, but approach vary from one to another

What is desirable is a systematic approach that works with greater class of functions of parameters

4

Further Issues With Classical Inferenceand Bayesian Inference

Classical Inference can provide only large sample inference for ANOVA with unequal variances Variance Components in Mixed Models BLUPs in Mixed Models

Classical Inference could yield wrong signs in Small Sample Inference• In Multi-regional clinical trials, some regions could yield negative dose response due

to chance

• Estimated response to a TV Ad could become negative in some markets even if there is no reason why the Ad would alienate any demographic segment

Bayesian Inference can provide small sample inference, but You need a prior When non-informative prior with such algorithms as MCMC is used, it

– takes days to estimate when model has large number of parameters– yields fairly different estimates somewhat different hyper parameters are used– yields fairly different estimates with different families of priors

Why not take the classical approach, but think like Bayesians?

5

Motivation (ctd.) Multi-regional clinical trial example (ctd.)

If you run LSE you may not even get the right sign for some Regions The problem could be alleviated using the same data in Mixed Model

setting Then you will get much more reasonable estimates (rather BLUPs),

In fact, LSE could yield the wrong sign even with two parameters:

Simulation from exact model, Y = 10 + .05 X +e when sample size 500 and e~N(0,1):

Mixed Models and BLUP (Best Linear Unbiased Predictor) are heavily used in high noise & small sample applications

But REML/ML frequently yield zero/negative variance components BLUPs fail or all become equal REML/ML could be inaccurate when factor variance is relatively small

6

An Introduction to Generalized Inference

Classical Pivotals for interval estimation are of the form Q=Q(X, )

Generalized Inference on a parameter , is a generalized pivotal of the form Q=Q(X, x, ) that is a function of Observable X, observed x, and nuisance parameters satisfying Q(x,x, , ) is free of having a distribution free of

Classical Extreme Regions are of the form Q(X, )<Q(x, ) cannot produce all extreme regions

Q( X,x, )< Q( x,x, greater class of extreme regions

Generalized Test and Intervals are based on exact probability statements on Q

Generalized Estimators are based on transformed Generalized Pivotals

If Q or a transformation satisfy Q(x,x, )= , then is estimated using E(Q), the expected value of Q, Median of Q, etc.

7

Generalized Inference: A Simple Example

Suppose, you have sample from X~N(μ, σ2 )

How to make Inferences about ρ = μ/σ, the coefficient of variation based on Sample mean and the Sample Variance?

Despite simple distributional results

and

if you start out with the MLE, /S , it will lead to just asymptotic inferences

But note that

is a Generalized Pivotal Quantity (GPQ), because (i) at the observed values R reduces to ρ, (ii) The distribution of R is free of unknown parameters

So any inference is possible. For example, Pr(R≤ ρ) yields an exact one-sided Generalized Confidence interval (GCI)

In fact above is a Classical CI, but MLE failed to produce it

Note: Exact CI always does not exist, but still you may be able to obtain an exact GCI. In such cases GCI tend to outperform more complicated approximations in terms of Repeated Sampling Properties

8

Generalized Inference (ctd.)

The case Q(x,x, )= is too restrictive except in location parameters

More generally, if Q(x,x, , ) = 0, then the solution of E{Q(X,x,)}=0 is said to be the Generalized Estimate of

Note: As in classical estimation, one will have a choice of estimates and need to find one satisfying such desirable conditions as minimum MSE

Major advantage of GE is that, as in Bayesian Inference, it can assure, via conditional expectation, any known signs of parameters Variance components are positive Variance ratio in BLUP is between 0 and 1

GE can produce inferences based on exact probabilities for Distributions such as Gamma, Weibull, Uniform

To do so you DO NOT need Prior or deal with hyper parameters

Read more about Generalized Inference

at www.weerahandi.org and even read my second book FREE!

http://www.weerahandi.org/

Application 1: Small Response Estimation when parameter sign is known

Problem with known sign of parameters of ten arise in practice: Price Elasticity of demand Response to promotional tactics Difference between a Treatment and Placebo effects Adverse effect of a treatment

Assume that a regression parameter, is supposed to be positive;

Let be LSE of . Then T=

Suppose >a (e.g. a=0 if sign is known).

Kim (2008) showed that the Bayesian Estimate under appropriate non-informative prior is

The above estimate is always positive

The same estimate can be obtained by considering the Generalized Pivotal Q= - (T- ) with observed value and taking the conditional expectation E(Q|Q>a) 9

Small Response Estimation (ctd)

Moreover, such classical estimators can be further improved by taking Stein (1961) like approach

Consider the class of estimators of the form

Find ks by Stein approach

The resulting estimator is denoted as IGE

As evident from the MSE (mean Squared Error) comparison IGE is uniformly better than LSE when the parameter is known to be positive

MLE (truncated) can also be improved upon

In Interval Estimation the approach provides shorter intervals

10

MSE Performance when s=1

11

Applications in Mixed Models

Mixed Models are especially useful in applications involving large samples with noisy data small samples with low noise

In Clinical Research & Public Health Studies, Mixed Model can yield results of greater accuracy in estimating effects by treatment levels Patient groups

In Sales & Marketing Mixed Models are heavily used to estimate Response due to promotional tactics:

– Advertisements (TV, Magazine, Web) by Market– Doctors Response to Field Rep Detailing

In fact, if you don’t use Mixed Models in this type of applications you may get unreliable or junk estimates, tests, and intervals

So, the BLUP has replaced the LSE as the most widely used statistical technique

12

An Example Suppose you are asked to estimate effect of a

TV/Magazine Ad by every Market/District using a model of longitudinal sales data on ad-stocked exposure If you run LSE you may not even get the right

sign of estimates for 40% of Markets If you formulate in a Mixed Model setting you will get

much more reliable estimates So, use Mixed Models and BLUP instead of LSE

Mixed Models and the BLUP (Best Linear Unbiased Predictor) are heavily used in high noise & small sample applications

In analysis of promotions, SAS Proc Mixed or R/S+ Lme is used more than any other procedure

But REML/ML frequently yield zero/negative variance components BLUPs fail or all become equal REML/ML could be inaccurate when factor variance is relatively small

13

Overview of Mixed Models

Suppose certain groups/segments distributed around their parent

Assumption in Mixed Models: Random effects are Normally distributed around the mean, the parent estimate, say M

Suppose Regression By Groups yield estimate Mi for Segment i

Let Vs be the between segment variance and Ve be the error variance, which are known as Variance Components

It can be shown that the Best Unbiased Predictor (BLUP) of Segment i effect is

a weighted average of the two estimates, and k is a known constant that depends on sample size and group data

The above is a shrinkage estimate that move extreme estimates towards the parent estimate

se

ise

kVV

MkVMV

Problem in Mixed Model Inference

BLUP in Mixed model is a function of Variance Components

Classical estimates of Factor variance can become negative when noise (error variance) is large and/or sample size is small

Then, ML and REML fails: PROC Mixed will complaint about non-convergence or will yield equal BLUPs for all segments

I tried the Bayesian approach with MCMC, but when I did a sanity check (i) by changing the hyper parameters OR (ii) by using Gamma

type prior in place of log-normal, I got very different estimates

After both the Classical & Bayesian Approaches failed me, I wrote a paper about “Generalized Point Estimation”, which can Assure estimates fall into the parameter space Can take advantage of known signs of parameters without any

prior Can improve MSE of estimates by taking such classical

methods as Stein method 14

15

Estimating Variance Components and BLUPs

Generalized approach can produce the above estimate or better estimates

Generalized pivotal quantity

is a Generalized Estimator and E(Q)=0 yields the classical estimate

But the drawback of the classical estimate is that MLE/UE frequently yields negative estimates

The conditional E(Q|C)=0 with known knowledge C yields

BLUPs are then obtained as weighted average Least Squares Estimates of Parent and Child

For simplicity consider a balanced Mixed Model The inference problems in canonical form reduces to:

16

Comparison of Variance Estimation Methods (based on 10,000 simulated samples): Performance of MLE Vs. GE

Assume One-Way Random Effects model with k segments n data from each segment Degrees of freedom a=k-1

and e=n(k-1)

The variance component is estimated by the MLE and GE

Note that with small sample sizes MLE/UE yield negative estimates for Variance Component

In such situations SAS does not provide estimates or BLUP (just say “did not converge”)

17

Comparison of Variance Estimation Methods:Performance of ML/REML Vs. GE (ctd.)

Table below shows MSE performance of competing estimators of factor variance

Note that Generalized estimate is better than any other estimate REML is not as good as ML

For estimation of the BLUP, Yu, Zou, Carlson, and Weerahandi (2013) provides similar improvements over the ML and REML

GE based methods do not suffer from the zero variance drawback of ML and REML

18

Further Issues with BLUP

ML and REML Prediction Intervals for BLUP are highly conservative: Actual coverage of 95% intended intervals area as large as 100% This implies serious lack of power in Testing of Hypotheses The drawback prevails unless number of groups tend to infinity

Generalized Intervals proposed by Mathew, Gamage, and Weerahandi (2012) can rectify the drawback

Table below shows Performance of competing estimates

19

Application: Estimation of Response to TV Ads by Market

Data Preparation: Obtain TV GRP and weekly/monthly Sales data by market Ad-stock (e.g. http://en.wikipedia.org/wiki/Advertising_adstock) TV GRP Obtain data for other variables that you want to control for De-mean all variables including ad-stocked GRP

Approach to Modeling: Model Sales or log sales as a linear function of all explanatory variables, including trend

and seasonality in sales Model the coefficients of ad-stocked GRP as random effects around the national average

Estimate the parameters of the Mixed Model by such methods as ML if there is no convergence problem, and by proposed generalized method otherwise

Use estimated responses to TV to write down the profit function

Demo

http://en.wikipedia.org/wiki/Advertising_adstock

draft – work in progress– confidential – do not distribute 1 sam weerahandi generalized...

Documents

q generalized inference

small sample inferences

inference tests

functions of parameters

mixed modelsblups

parameter q

sample size

different hyper parameters