mallows cp for out-of-sample purduelbrown/papers/2016f mallows.pdf1n). • c p ⊕ can be written: c...

35
Mallows C p for Out-of-sample Prediction Lawrence D. Brown Statistics Department, Wharton School, University of Pennsylvania [email protected] Purdue University, Nov. 11, 2016 Joint work with others in the Wharton Linear Models Research Group: including R. Berk, A. Buja, A. Kuchibotla and L. Zhao.

Upload: others

Post on 28-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mallows Cp for Out-of-Sample Purduelbrown/Papers/2016f Mallows.pdf1n). • C p ⊕ can be written: C p ⊕=n−1SSE [Γ]+2n −1ςˆ [Γ] 2. • 1st term is an underestimate of R

MallowsCpforOut-of-samplePrediction

LawrenceD.BrownStatisticsDepartment,

WhartonSchool,[email protected]

PurdueUniversity,Nov.11,2016JointworkwithothersintheWhartonLinearModelsResearchGroup:includingR.Berk,A.Buja,A.KuchibotlaandL.Zhao.

Page 2: Mallows Cp for Out-of-Sample Purduelbrown/Papers/2016f Mallows.pdf1n). • C p ⊕ can be written: C p ⊕=n−1SSE [Γ]+2n −1ςˆ [Γ] 2. • 1st term is an underestimate of R

2

LinearModel

• “Conventional”LinearModel

(LM)

Y = Xβ + e; Y& e ∼ n ×1, X ∼ n × r, β ∼ r ×1E ei( ) = 0, ei independent, var ei( ) =σ 2 (unknown)

.

• Variableselection: Chooseasubsetofvariables=columnsofX=X[p]. ReanalyzeviaLeastSquaresusingX[p]under(LM). Get β[ p]andY[ p] = X[ p]β[ p].

• Cpisanumericalmeasureoftenusedtochooseagoodsubsetofvariables.

HereXisamatrixofconstants(notrandom).ThefirstcolumnofXis1.So,ordinaryregressionisp=2.

Page 3: Mallows Cp for Out-of-Sample Purduelbrown/Papers/2016f Mallows.pdf1n). • C p ⊕ can be written: C p ⊕=n−1SSE [Γ]+2n −1ςˆ [Γ] 2. • 1st term is an underestimate of R

3

Cp

Mallowsversionforasub-modelofsizepis Cp = SSE[ p] σ r

2( )− n + 2p .Notethatσ r

2 comesfromthefullmodel.Thisisusedtocomparethesuitabilityofvarioussub-models.Onecanuseeitheranallsubsetsapproachorastepwiseapproach(justforward,orforwardandbackward,etc.).Mallows(1964,1966,oralpresentations;1973Technometrics)GormanandToman(1966,Technometrics)

Page 4: Mallows Cp for Out-of-Sample Purduelbrown/Papers/2016f Mallows.pdf1n). • C p ⊕ can be written: C p ⊕=n−1SSE [Γ]+2n −1ςˆ [Γ] 2. • 1st term is an underestimate of R

4

Example:PolynomialRegressionCholesteroldatafromEfron&Feldman(1991),Efron(2013,JASA)).

YisimprovementincholesterolleveloverdurationofstudyXisameasureofcomplianceduringstudy(normalizedtoapprox.normal)

-20

0

20

40

60

80

100

120

y

-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2x

Page 5: Mallows Cp for Out-of-Sample Purduelbrown/Papers/2016f Mallows.pdf1n). • C p ⊕ can be written: C p ⊕=n−1SSE [Γ]+2n −1ςˆ [Γ] 2. • 1st term is an underestimate of R

5

StepwiseRegressionEnteredinorderofdegreeofpolynomial[Notation:X^k = X − X( )k ]

Parameter "Sig

Prob" RSquare Cp p AICc BIC

X 0.0000 0.4853 3.137 2 1477.3 1486.5 X^2 0.1988 0.4905 3.467 3 1477.7 1489.9 X^3 0.0816 0.5001 2.427 4 1476.7 1491.8 X^4 0.7572 0.5004 4.331 5 1478.7 1496.8 X^5 0.2654 0.5043 5.089 6 1479.7 1500.7 X^6 0.7656 0.5046 7 7 1481.8 1505.7

Cpsuggeststhatthecubicmodelisbest.Notethatall-subsetschooseslinearandcubic;noquadraticterm.

Page 6: Mallows Cp for Out-of-Sample Purduelbrown/Papers/2016f Mallows.pdf1n). • C p ⊕ can be written: C p ⊕=n−1SSE [Γ]+2n −1ςˆ [Γ] 2. • 1st term is an underestimate of R

6

Alternate,equivalentformofCpRecall, Cp = SSE[ p] σ r

2( )− n + 2p .Amoreconvenientformmovingforwardis

CpU = σ r

2

nCp + n( ) = SSE[ p]n

+ 2p σ r2

n.

Bothselectthesamemodels.Cphasaniceself-normalizingproperty:C[r ] = r .ButCp

U hasanunbiasednesspropertythatresonateswelllater.Continuetonotethatσ r

2 isderivedfromfullmodelresiduals.

Page 7: Mallows Cp for Out-of-Sample Purduelbrown/Papers/2016f Mallows.pdf1n). • C p ⊕ can be written: C p ⊕=n−1SSE [Γ]+2n −1ςˆ [Γ] 2. • 1st term is an underestimate of R

7

Example:VariableSelectionPROSTATEdatafromTibshirani(1996,JRSS-B),DataadaptedfromStamey,et.al(1989)• ResponseisY=“lcavol”(=logCancerVolume)

Thereare8potentialexplanatoryvariables.n=97• Weusedall-subsetswithCp

U .ButI’lldisplayasastep-wisetable.Inthisexamplestep-wiseandall-subsetsyieldedthesamemodels.Note:IswitchedvariablesfromtheanalysisinTibshirani(1996);heusedY=lpsaandlcavolwasoneofthepotentialcovariates.Weusefulldataset;Tibshirani(1996)splitsthedataandusesonlyn=71.

Page 8: Mallows Cp for Out-of-Sample Purduelbrown/Papers/2016f Mallows.pdf1n). • C p ⊕ can be written: C p ⊕=n−1SSE [Γ]+2n −1ςˆ [Γ] 2. • 1st term is an underestimate of R

8

Step-wiseTable Parameter "Sig

Prob" Seq SS RSquare Cp Cp

U AICc

constant 0.000 176.785 176.84 1.977 310.3 lpsa 0.000 71.938 0.539 32.20 0.643 237.2 lcp 0.000 14.143 0.646 5.37 0.508 214.0 age 0.151 1.040 0.653 5.25 0.507 214.1 lbph 0.098 1.359 0.664 4.48 0.503 213.4 pgg45 0.283 0.567 0.667 5.32 0.507 214.5 gleason 0.162 0.957 0.675 5.37 0.508 214.8 svi 0.549 0.175 0.676 7.01 0.516 216.8 lweight 0.903 0.007 0.676 9.00 0.526 219.3

• Cpchoosesa4-factormodel(sameviastep-wise,asabove,orviaallsubsets)

• I’lllaterreturntothisexample.[NOTE:Iincludedthe“constant”asapossiblemodel.ThiscorrespondstothemodelY = β1.Thisisnottraditionalinsuchatable.]

Page 9: Mallows Cp for Out-of-Sample Purduelbrown/Papers/2016f Mallows.pdf1n). • C p ⊕ can be written: C p ⊕=n−1SSE [Γ]+2n −1ςˆ [Γ] 2. • 1st term is an underestimate of R

9

EstimationofPredictiveRisk• ThecoreconceptwithinMallowsmethod

Mallowsnoted:• Cpisa[normalized]estimateof[excess]predictiveriskandalsoexplicitlyassumed• Thex’saretakentobefromafixeddesign,notfromarandomsample.

Page 10: Mallows Cp for Out-of-Sample Purduelbrown/Papers/2016f Mallows.pdf1n). • C p ⊕ can be written: C p ⊕=n−1SSE [Γ]+2n −1ςˆ [Γ] 2. • 1st term is an underestimate of R

10

Details:• ConsiderX,fixed,withlinearestimatorY = Xβ andanew(vector)observationY∗ = Xβ + e∗.

• TheTotalpredictiveRiskacrossthedesignis

TR X ! E Y −Xβ

2⎛⎝

⎞⎠ .

Theaverage,per-coordinate,isR X = n−1TR X .• THEN,asimplicitinMallows,

(U) E CpU( ) = R X .[Exactunbiasedness]

• Mallowshas(M) Cp ≈ TR X − nσ

2( ) σ 2 = Excess total pred risk( ) σ 2 .SeeGilmour(1996,JRSS-D)foraslightmodification.

Page 11: Mallows Cp for Out-of-Sample Purduelbrown/Papers/2016f Mallows.pdf1n). • C p ⊕ can be written: C p ⊕=n−1SSE [Γ]+2n −1ςˆ [Γ] 2. • 1st term is an underestimate of R

11

• Becauseof(M)Manystatisticiansandstatisticaltextbooks

believe/claimthat

“Cpisasuitableestimateofpredictiverisk.”!!!

BUT

That’snottrue!

Page 12: Mallows Cp for Out-of-Sample Purduelbrown/Papers/2016f Mallows.pdf1n). • C p ⊕ can be written: C p ⊕=n−1SSE [Γ]+2n −1ςˆ [Γ] 2. • 1st term is an underestimate of R

12

• Here’swhyit’snottrue...(onereason)• Thepredictivegoalistopredict(squared-error)riskforanindividualwhoisnotinthestatisticalsample.

• Thisis(withX∗,Y ∗denotingthenewmeasurements)

(predRisk) RX[ p ]∗ ! E Y ∗ − X[ p]

∗T β( )2{ }.• Thisisnotthesameas

(Mallows) RX[ p ]=Χ

! n−1E Y − Χβ2X[ p] = Χ⎛

⎝⎜⎞⎠⎟ ,

whichisthetargetofCpU .

• Whyarethesetwodifferent?Synopsisfollows.&see“Models as Approximations -- A Conspiracy of Random Regressors and Model Deviations Against Classical Inference in Regression” WLMRG,

Page 13: Mallows Cp for Out-of-Sample Purduelbrown/Papers/2016f Mallows.pdf1n). • C p ⊕ can be written: C p ⊕=n−1SSE [Γ]+2n −1ςˆ [Γ] 2. • 1st term is an underestimate of R

13

EstimatingthePredictiveRisk,RPreamble:• Cpassumesawell-specifiedlinearmodelwithfixedregressors,Xandhomoscedasticityofresiduals.

• Thefollowingisforrandomregressors;itestimatespredictiveriskoflinearestimatorsinsuchasetting.

• Inaddition,thefollowingdoesnotassumeawell-specifiedlinearmodelorhomoscedasticity.INSTEAD

• Itholdsinthe“assumptionlean”framework–(seenextslide).

• ThereisasmallpricerelativetowhatonegetswithCp:• Thenewestimatorofriskisonlyanasymptoticallyunbiasedestimatorofitstarget.

Page 14: Mallows Cp for Out-of-Sample Purduelbrown/Papers/2016f Mallows.pdf1n). • C p ⊕ can be written: C p ⊕=n−1SSE [Γ]+2n −1ςˆ [Γ] 2. • 1st term is an underestimate of R

14

AssumptionLeanLinearRegression• ObserveiidSample Xi ,Yi{ }withXi ∈ℜ

r , X,Y ∼ F .• NoassumptionsaboutF,otherthanexistenceoflowordermoments.• Contemplateafutureobservation X

∗, Y ∗ ∼ F and• Constructbestbi-linearpredictortominimizepredictiverisk,R.[Predictoroftheform X∗T !β with !β linearinY.]

• ThisisX∗Tβ withβ theusualLSestimator.• Notation:Thetarget/oraclepredictorisX∗Tβ withβ beingthepopulationLSparameter:

β = argminb EF Y − X Tb( )2{ }⇔

β = E XX T( )⎡⎣ ⎤⎦−1E XY( ).

Page 15: Mallows Cp for Out-of-Sample Purduelbrown/Papers/2016f Mallows.pdf1n). • C p ⊕ can be written: C p ⊕=n−1SSE [Γ]+2n −1ςˆ [Γ] 2. • 1st term is an underestimate of R

15

EstimateofR Γ[ ];Definition• Forasubmodel,Γ ,computethesampleLSresiduals

ρ[Γ ]i = Yi − ′X[Γ ]iβ[Γ ],i = 1,..,n.• Thencomputethematrices

M[Γ ] = n−1 ′X[Γ ]X[Γ ] and W[Γ ] = n

−1 X[Γ ]i ′X[Γ ]iρ[Γ ]i2

i=1

n

∑ .

• EstimateofR[Γ ],thepredictiveriskforsub-modelΓ is

Cp

⊕ = n−1 Y −X Γ[ ]β Γ[ ]2+ 2n−1tr M Γ[ ]

−1 W Γ[ ]( )= n−1SSE[Γ ] + 2n−1ς [Γ ]

2 (See derivation on next pages)

• ComparethistoCpU = n−1SSE[Γ ] + 2n

−1 pσ r2( )--

Thedifferenceisthelastterm,whichestimatesvariance.

Page 16: Mallows Cp for Out-of-Sample Purduelbrown/Papers/2016f Mallows.pdf1n). • C p ⊕ can be written: C p ⊕=n−1SSE [Γ]+2n −1ςˆ [Γ] 2. • 1st term is an underestimate of R

16

Derivation• FollowsthepatternforderivationofCp.• UsesΔ methodandweakLLN(orCLT),• AndtheSandwichestimator.• Let≈ denotesuitableasymptoticapproximation.• LetΨ[Γ ]denotetheusualsandwichestimatorforthecovariancematrixofβ[Γ ] (assumption-leansetting):

Ψ[Γ ] = M[Γ ]−1 W[Γ ]M[Γ ]

−1 .• Then(suppressingthesubscript[Γ ]forconvenience)

Page 17: Mallows Cp for Out-of-Sample Purduelbrown/Papers/2016f Mallows.pdf1n). • C p ⊕ can be written: C p ⊕=n−1SSE [Γ]+2n −1ςˆ [Γ] 2. • 1st term is an underestimate of R

17

Derivation,continued:

R ! E Y ∗ − X∗Tβ( )2= E Y ∗ − X∗Tβ( )2

+ E X∗T β − β( )( )2

≈ E Y ∗ − X∗Tβ( )2+ E X∗TΨX∗ n( ) (Sandwich)

≈ n−1 Y −Xβ 2 + n−1tr ΨXTX n( ) (Empirical moment)

≈ n−1 Y −Xβ2+ n−1tr ΨXTX n( ) + n−1tr ΨXTX n( )

= n−1 Y −Xβ2+ 2n−1tr MW( ) !Cp

endofderivation.

Page 18: Mallows Cp for Out-of-Sample Purduelbrown/Papers/2016f Mallows.pdf1n). • C p ⊕ can be written: C p ⊕=n−1SSE [Γ]+2n −1ςˆ [Γ] 2. • 1st term is an underestimate of R

18

AsymptoticResults Trackingtheerrortermsintheprecedingderivation(andbeingpreciseabout[benign]assumptionsonexistenceofmoments)yields–T1: Cp

⊕ isasymptoticallyunbiasedinthesensethatE Cp

⊕( ) = R +O 1 n( ).T2: Cp

⊕ hasconsiderableasymptoticvariabilityaboutitsmean.Tobeprecise, Cp

⊕ = R +OP 1 n( ).• T2isdisappointing.Here’swhy

Page 19: Mallows Cp for Out-of-Sample Purduelbrown/Papers/2016f Mallows.pdf1n). • C p ⊕ can be written: C p ⊕=n−1SSE [Γ]+2n −1ςˆ [Γ] 2. • 1st term is an underestimate of R

19

ADisappointment

T2Says Cp⊕ = R +OP 1 n( ).

• Cp⊕canbewritten:Cp

⊕ = n−1SSE[Γ ] + 2n−1ς [Γ ]

2 .• 1sttermisanunderestimateofR.The2ndtermisacorrectionmeanttofixtheoptimismin1stterm.

• AccordingtoT2thesecondtermisasymptoticallynegligibleincomparisontonaturalrandomnessinCp

⊕ .• HenceifthegoalofCp

⊕ weretoestimateR,thenthe2ndtermisasymptoticallyuseless,andonemightjustaswellusen−1SSE[Γ ].

• AnalogouscommentsareequallytrueaboutCpU .Hence

(asymptotic)unbiasednessisirrelevantforestimationofR.

Page 20: Mallows Cp for Out-of-Sample Purduelbrown/Papers/2016f Mallows.pdf1n). • C p ⊕ can be written: C p ⊕=n−1SSE [Γ]+2n −1ςˆ [Γ] 2. • 1st term is an underestimate of R

20

ComparisonofSub-Models• Inpractice,theuseofMallows’typemeasuresistocomparetwo(ormore)sub-models.

• LetΓ1, Γ2 denotetwodesignatedsub-modelswhosepredictivepoweristobecomparedinananalysis.Then

T3: R Γ2[ ] − R Γ1[ ] = C Γ2[ ]⊕ −C Γ1[ ]

⊕ +OP 1 n( ).• Thestochasticerrorinthecomparisonoftwosub-modelsisofthesameorderasthesecondtermsinthedefinitionsofCp

⊕ .Thusthosetermsmayimprovetheestimateofcomparativepredictiverisks.

• Buttheydonotguaranteesuccess.• Unfortunately,nothingcandoso!(T4.)

Page 21: Mallows Cp for Out-of-Sample Purduelbrown/Papers/2016f Mallows.pdf1n). • C p ⊕ can be written: C p ⊕=n−1SSE [Γ]+2n −1ςˆ [Γ] 2. • 1st term is an underestimate of R

21

Cp⊕ DoesNotGuaranteeSuccessfulComparisons

notevenasymptotically• Suppose Yi = a + ε i , ε i ∼ N 0,1( ), indep ,• Comparetwomodels:Γ1 :Y = β1&Γ2 :Y = β1 + β2X .• ThenΓ1is“correct”andyieldsthebetterpredictions.• BUT,asn→∞

P CΓ2

⊕ <CΓ1

⊕( )→ P χ12 > 2( ) = 0.157

= P wrong choice( ).

• ThisishowfarfromperfectlyCp⊕does.Cpwilldothe

samesincethisistheconventionalsetup.

Page 22: Mallows Cp for Out-of-Sample Purduelbrown/Papers/2016f Mallows.pdf1n). • C p ⊕ can be written: C p ⊕=n−1SSE [Γ]+2n −1ςˆ [Γ] 2. • 1st term is an underestimate of R

22

CanCp(orCpU )andCp

⊕ givedifferentmodelchoices?

• Theycan!Butinwell-behavedexamplestheyoftendonot.

• Herearetwomoderatelywell-behavedrealdataexamplestoillustratethis;

• followedbyathirdexample,fromasimulation,thatdemonstratesamoreextremecase.

Page 23: Mallows Cp for Out-of-Sample Purduelbrown/Papers/2016f Mallows.pdf1n). • C p ⊕ can be written: C p ⊕=n−1SSE [Γ]+2n −1ςˆ [Γ] 2. • 1st term is an underestimate of R

23

Example:ProstateDatafromTibshirani(1996)asdiscussedpreviously

StepwiseandAll-subsetsyieldsamemodelchoices.HereisastepwisetablecomparingresultsfromCp

U andCp⊕.

p CpU Cp

⊕ [2]lpsa 0.6433 0.6557[3]lcp 0.5076 0.5169[4]age 0.5070 0.5155[5]lbph 0.5031 0.5117[6]pgg45 0.5074 0.5114[7]gleason 0.5076 0.5090[8]svi 0.5159 0.5143[9]lwt 0.5259 0.5236

Screeplot---

Page 24: Mallows Cp for Out-of-Sample Purduelbrown/Papers/2016f Mallows.pdf1n). • C p ⊕ can be written: C p ⊕=n−1SSE [Γ]+2n −1ςˆ [Γ] 2. • 1st term is an underestimate of R

24

Page 25: Mallows Cp for Out-of-Sample Purduelbrown/Papers/2016f Mallows.pdf1n). • C p ⊕ can be written: C p ⊕=n−1SSE [Γ]+2n −1ςˆ [Γ] 2. • 1st term is an underestimate of R

25

CriminalSentencing ThisdataisanalyzedinWLMRG(2015,“Conspiracy”)andMcCarthy,Zhang&WLMRG(2016,Doublebootstrap).Usedhereisarandomsampleofsize500fromamuchlargerdata-setcollectedandanalyzedbyR.Berk. TheYvariableisthelengthofcriminalsentence.Thecovariatesaredemographicdescriptorsofthesentencedindividuals&theirpriorcriminaljusticeexperienceandthetypeandseverityofthecrimeforwhichtheyaresentenced;r=14.Hereisthescreeplotforallsubsetsregression,showingMallows’typevaluesforthebestsubsetforeachvalueofp.

Page 26: Mallows Cp for Out-of-Sample Purduelbrown/Papers/2016f Mallows.pdf1n). • C p ⊕ can be written: C p ⊕=n−1SSE [Γ]+2n −1ςˆ [Γ] 2. • 1st term is an underestimate of R

26

Notethebestsubsetsizewas7underbothmeasures.Thesubsetswerethesame.Thecurvesareeachfairlyhorizontal.(Notetheverticalscale.).Ifoperatingstepwiseonemightdecidetousep = 4 underCp

⊕ and p = 5 underCp.Cp⊕ isaboveCp,asistypicalbut

notnecessary.

Page 27: Mallows Cp for Out-of-Sample Purduelbrown/Papers/2016f Mallows.pdf1n). • C p ⊕ can be written: C p ⊕=n−1SSE [Γ]+2n −1ςˆ [Γ] 2. • 1st term is an underestimate of R

27

Simulation• Inordertoshowwhatcanhappeninlesswell-behavedsettingswesimulateddatafromaparticularjointdistribution,F.

• Itwaschosenbytrialanderrortoyieldinterestingresults.Seenextpageforasampledataset.

• TheanalyticalmodelstocomparewereΓ1 :Y = b1; Γ2 :Y = b1 + b2X .

• [Neithermodelisacorrect,perfectdescriptionofF.]• Followingisascatterplotforatypicaldataset,followedbyotherinfo.Youcanseeboththeconstantandlinearmodelsaremis-specifiedandthereisstrongleverage.

(Simulationhad4,000replications.)

Page 28: Mallows Cp for Out-of-Sample Purduelbrown/Papers/2016f Mallows.pdf1n). • C p ⊕ can be written: C p ⊕=n−1SSE [Γ]+2n −1ςˆ [Γ] 2. • 1st term is an underestimate of R

28

Typicalscatterplotfromsimulation

Notethepossiblenon-linearityand/orhighleverageoutlier.(Whatdoyouthinkisthetruedistributionthatgeneratedthisdata?)

-4

-2

0

2

4

6

8

10

Y

-1 0 1 2 3 4 5X

Page 29: Mallows Cp for Out-of-Sample Purduelbrown/Papers/2016f Mallows.pdf1n). • C p ⊕ can be written: C p ⊕=n−1SSE [Γ]+2n −1ςˆ [Γ] 2. • 1st term is an underestimate of R

29

p Rp Cp Cp⊕

1 2.84 2.84 2.852 3.26 2.28 2.72

Table1:Actualvalueofthepredictivesquaredriskandtheaveragesofitsestimates.

• NeitherMallows’typeofestimatedoesagoodaveragejobofestimatingR2 .(PS:Simulationerrorwasnearlynegligible.)ButC2

⊕doesbetteronaveragethanC2.• Thebettermodelasbetweenthetwoisp=1.And/But

C1⊕ <C2

⊕ (ashoped)52%ofthetime,whileC1<C2only25%ofthetime.

• SO,astheorypredicts,Cp⊕doesbetterthanCpat

comparing,thoughneitherdoessparklinglywell.

Page 30: Mallows Cp for Out-of-Sample Purduelbrown/Papers/2016f Mallows.pdf1n). • C p ⊕ can be written: C p ⊕=n−1SSE [Γ]+2n −1ςˆ [Γ] 2. • 1st term is an underestimate of R

30

Takeaways:• ForobservationaldataMallows’Cpisanunjustifiedestimateofpredictiverisk.Cp

⊕ isavalidestimate.• Bothhavehighnoiselevel⇒maynotbeveryuseful.• EvenforestimatingΔ betweensub-modelsCp

⊕ hasahighnoisetosignalratio(asdoesCp),andsomaynotgiverightanswerinchallengingsituations.

• BUTatleastCp⊕ aimsatthecorrecttarget,evenin

assumption-leansettings.Also,it’salmostaseasytocomputeasCp.

Here’swhatMallowswrote/saidaboutCp,andisequallytrueaboutCp

⊕ :

Page 31: Mallows Cp for Out-of-Sample Purduelbrown/Papers/2016f Mallows.pdf1n). • C p ⊕ can be written: C p ⊕=n−1SSE [Γ]+2n −1ςˆ [Γ] 2. • 1st term is an underestimate of R

31

Mallows view of how Cp should – and should not – be used (1973): “The discussion above does not lend any support to the practice of taking the lowest point on a Cp-plot as defining a "best" subset of terms. The present author feels that the greatest value of the device is that it helps the statistician to examine some aspects of the structure of his data and helps him to recognize the ambiguities that confront him. The device cannot be expected to provide a single "best" equation when the data are intrinsically inadequate to support such a strong inference.”... “ ...the ambiguous cases where the "minimum Cp" rule will give bad results are exactly those where a large number of subsets are close competitors for the honor. With such data no selection rule can be expected to perform reliably.”

Page 32: Mallows Cp for Out-of-Sample Purduelbrown/Papers/2016f Mallows.pdf1n). • C p ⊕ can be written: C p ⊕=n−1SSE [Γ]+2n −1ςˆ [Γ] 2. • 1st term is an underestimate of R

32

Furtherresearchinprogressrevealsnon-triviallimitsonthepossibilityofcorrectlyidentifyingbetterand/orbestmodels.ItturnsoutthatCp,whenapplicable,andCp

⊕moregenerallyprovidea(nearly)optimalstatisticforsuchpurposesifusedproperly.

Page 33: Mallows Cp for Out-of-Sample Purduelbrown/Papers/2016f Mallows.pdf1n). • C p ⊕ can be written: C p ⊕=n−1SSE [Γ]+2n −1ςˆ [Γ] 2. • 1st term is an underestimate of R

33

Take-aways (cont.) • As Andreas Buja likes to express this idea:

“LookattheScreeplot.Itwilltypicallyfallsteeply,sortoflevelout,andthenmaygraduallyrise.Keepthevariablesforwhichtheplotisfallingsteeply,throwoutallthoseaftertheplothasnoticeablyrisen,anddowhatyouwantwiththevariablesinbetween.Foraparsimoniousandusuallysatisfactorymodelthrowoutallthevariablesafterthesteepfall.”P.S.:ItispossibletopursuefurtherdistributiontheoryaboutCpandCp

⊕toprovideastatisticalquantificationastowhat“steeply”and“noticeably”mean.WeknowhowtodothisforCp.We’restillworkingonthisforCp

⊕,andtoimproveresultsforCp.

Page 34: Mallows Cp for Out-of-Sample Purduelbrown/Papers/2016f Mallows.pdf1n). • C p ⊕ can be written: C p ⊕=n−1SSE [Γ]+2n −1ςˆ [Γ] 2. • 1st term is an underestimate of R

34

end

Parentheticalnote:General“conspiracy”theorysuggeststhatifthecovariatesarerandombutthelinearmodelisfirstandsecondorderwell-specifiedthenfixedXanalysisiscorrect.Howeverthat’snotnecessarilytruewithrespecttotheCpgoalofestimatingpredictiveriskwhenusingsub-models.EvenforsuchacaseoneshoulduseCp

⊕ oranalogousmeasure.Thereasonisthatthesub-modelneednotbewell-specified.Thus,eventhough

E Y X( )islinearinthevectorXitneednotbethecasethatE Y X[p]( ) islinearinthereducedmodelvectorX[p].

Page 35: Mallows Cp for Out-of-Sample Purduelbrown/Papers/2016f Mallows.pdf1n). • C p ⊕ can be written: C p ⊕=n−1SSE [Γ]+2n −1ςˆ [Γ] 2. • 1st term is an underestimate of R

35

PSinthefinalsimulation• Thedataarefrom

X ∼ exp(1)−1, Y = b1 + b2X + b3X3 + Z ,n=100.

b1 = b2 = b3 = 0.1(Thischoicegaveinterestingresults.)