stat 231 list of concepts and formulas -...

Stat 231 List of Concepts and Formulas"Course Review" Sheets

Prof. Stephen Vardeman

Iowa State UniversityStatistics and IMSE

September 6, 2011

Vardeman (ISU Stat and IMSE) Stat 231 Summary September 6, 2011 1 / 58

Day 1-Introduction

Probability vs Statistics

Simple Descriptive Statistics

x =1n

n

∑i=1xi

s2 =1

n− 1n

∑i=1(xi − x)2

Properties of x and s

for y = ax + b, y = ax + b and sy = |a| sx

JMP


Day 2-Notions of "Chance" and Mathematical Theory

Sample Space (Universal Set) SEvents (Sets) A,B

Empty Event ∅


Day 3-Set Operations on Events

Words to Symbols and Symbols to Words

Set Operations on Events

A andB = A∩ BA orB = A∪ BnotA = Ac (A)

Mutuality Exclusive (Disjoint) Events A,B

A andB = ∅


Day 4-Axioms of Probability and "the Addition Rule"

Basic Rules of Operation1 0 ≤ P (A) ≤ 12 P (S) = 1 (and P (∅) = 0)3 If A1,A2, . . . are disjoint events

P (A1 orA2 or . . .) = P (A1) + P (A2) + · · ·

A Small "Theorem"

P (notA) = 1− P (A)

The "Addition Rule" (Another Theorem)

P (A orB) = P (A) + P (B)− P (A andB)


Day 5-Conditional Probability and Independence of Events

Conditional Probability of A Given B

P (A|B) = P (A andB)P (B)

The "Multiplication Rule"

P (A andB) = P (A|B) · P (B)

Events A,B are Independent Exactly When

P (A|B) = P (A) i.e. when P (A andB) = P (A) · P (B)

(Multiple Events are Independent When Every Intersection of AnyCollection of Them (or Their Complements) Has ProbabilityObtainable as a Product of Individual Probabilities)


Day 6-Counting

When Outcomes are Equally Likely

P (A) =# (A)# (S)

A Basic Principle: When a complex action can be broken into a series of kcomponent actions, the first of which can be done n1 ways, the second ofwhich can subsequently be done n2 ways, the third of which cansubsequently be done n3 ways, etc., the whole can be accomplished in

n1 · n2 · · · · · nkdifferent waysCount of Possible Permutations

Pr ,n =n!

(n− r)!Count of Possible Combinations(

nr

)=

n!r ! (n− r)!


Day 7-Discrete Random Variables and Specifying TheirDistributions

Probability Mass Function

f (x) = P [X = x ]

Cumulative Distribution Function

F (x) = P [X ≤ x ] (general)F (x) = ∑

z≤xf (z) (discrete)


Day 8-Expectation for Discrete Variables

Expected (or Mean) Value of h (X ) for a Discrete X

Eh (X ) = ∑xh (x) f (x)

Mean of X (Mean/Center of the Distribution of X )

EX = ∑xxf (x) ( = µX )

Variance of X (a Measure of Spread for the Distribution of X )

VarX = ∑(x − EX )2f (x) ( = σ2X )

= ∑ x2f (x)− (EX )2

= EX 2 − (EX )2


Day 9-More Mean and Variance/Independent IdenticalSuccess-Failure Trials

Chebyschev’s Inequality (general)

P [µX − kσX < X < µX + kσX ] ≥ 1−1k2

Other Useful Facts (general)

E (aX + b) = aEX + b

Var (aX + b) = a2 VarX

σaX+b =√

Var (aX + b) = |a| σX

A Convenient (and Sometimes Appropriate) Model is the "BernoulliTrials" Model:

1 P [success on trial i ] = p (fixed, the same for all i)2 The events Ai = "success on trial i" are all independent


Day 10-Binomial and Geometric Distributions

Under the Bernoulli Trials Model:

X = the number of successes in n trials Has the Binomial(n, p)Distribution

f (x) =

(nx

)px (1− p)n−x for x = 0, 1, . . . , n

0 otherwise

With EX = np and VarX = np (1− p)X = the trial on which the first success occurs Has theGeometric(p) Distribution

f (x) ={p(1− p)x−1 for x = 1, 2, . . .0 otherwise

With 1− F (x) = (1− p)x , EX = 1pand VarX =

1− pp2


Day 11-Geometric and Poisson Distributions

The Poisson(λ) Distribution is a Commonly Used Model for

X = the number of occurrences of a relatively rare

phenomenon across a fixed interval of time or space

This Has

f (x) =

e−λλx

x !for x = 0, 1, 2, . . .

0 otherwise

With EX = λ and VarX = λ


Day 12-Continuous Random Variables, pdf’s and cdf’s

Probability Density Function, f (x) ≥ 0 with

P [a ≤ X ≤ b] =∫ b

af (x) dx

(Continuous) Cumulative Distribution Function

F (x) = P [X ≤ x ] =∫ x

−∞f (t) dt

cdf to pdfddxF (x) = f (x)


Day 13-Expectation for Continuous Variables/NormalDistributions

Expected (or Mean) Value of h (X ) for a Continuous X

Eh (X ) =∫ ∞

−∞h (x) f (x) dx

Mean of X (Mean/Center of the Distribution of X )

EX =∫ ∞

−∞xf (x) dx ( = µX )

Variance of X (a Measure of Spread for the Distribution of X )

VarX =∫ ∞

−∞(x − EX )2f (x) dx ( = σ2X )

=∫ ∞

−∞x2f (x) dx − (EX )2

= EX 2 − (EX )2

All the Day 9 Facts Hold for Continuous VariablesVardeman (ISU Stat and IMSE) Stat 231 Summary September 6, 2011 14 / 58

Day 14-Normal Distributions

Normal(µ, σ2

)pdf

f (x) =1√2πσ2

e−(x−µ)2/2σ2 for all x

Standard Normal (µ = 0, σ = 1) Version

f (z) =1√2πe−z

2/2 for all z

Standard Normal cdf (tabled)

Φ(z) = F (z) =∫ z

−∞

1√2πe−t

2/2 dt

Conversion to Standard Units

z =x − µ

σ


Day 15-Normal Approximation to Binomial/Exponentialand Weibull Distributions

For Large n and Moderate p, a Binomial(n, p) Distribution isApproximately Normal (With µ = np and σ2 = np (1− p))The Exponential(λ) Distribution Has

f (x) ={

λe−λx for x > 00 otherwise

With EX =1λ, VarX =

1

λ2and F (x) =

{0 if x ≤ 01− e−λx if x > 0

The Weibull(α, β) Distribution Has

F (x) ={0 if x < 01− e−(x/β)α if x ≥ 0

With Median F−1(.5) = βe−(.3665/α) and Scale Parameter β


Day 16-Jointly Discrete Random Variables

Joint Probability Mass Function

f (x , y) = P [X = x andY = y ]

Marginal Probability Mass Functions

g(x) = ∑yf (x , y) and h(y) = ∑

xf (x , y)

Conditional Probability Mass Functions

g(x | y) = f (x , y)h(y)

and h(y | x) = f (x , y)g(x)


Day 17-Jointly Discrete and Continuous Variables

Independence of Discrete Random Variables

f (x , y) = g(x)h(y) for all x , y

Joint Probability Density Function f (x , y) ≥ 0

P [(X ,Y ) ∈ R] =∫ ∫

Rf (x , y) dx dy

Marginal Probability Density Functions

g(x) =∫ ∞

−∞f (x , y) dy and h(y) =

∫ ∞

−∞f (x , y) dx


Day 18-Continuous Variables, Conditionals, andIndependence

Conditional Probability Densities

g(x | y) = f (x , y)h(y)

and h(y | x) = f (x , y)g(x)

Independence of Continuous Random Variables

f (x , y) = g(x)h(y) for all x , y


Day 19-Functions of Several Random Variables andExpectation

For Jointly Distributed Variables X ,Y , . . . ,Z the Distribution ofU = g (X ,Y , . . . ,Z ) Can Sometimes Be Derived

JMP Simulation to Approximate the Distribution ofU = g (X ,Y , . . . ,Z ) (and EU) for Independent X ,Y , . . . ,Z

Expectation of U = g (X ,Y )

Eg (X ,Y ) = ∑x ,yg (x , y) f (x , y) (discrete)

Eg (X ,Y ) =∫ ∞

−∞

∫ ∞

−∞g (x , y) f (x , y) dx dy (continuous)


Day 20-Covariance, Correlation, and Laws of Expectation

Cov (X ,Y ) = E (X − EX) (Y − EY ) ( = E (X − µX ) (Y − µY ) )

= EXY − EXEY ( = EXY − µX µY )

ρ = Corr (X ,Y ) =Cov (X ,Y )√VarX · VarY

=Cov (X ,Y )

σX σY

−1 ≤ ρ ≤ 1 With ρ = ±1 Exactly When X and Y are PerfectlyLinearly Related

X ,Y Independent Implies ρ = 0

E(aX + b) = aEX + b (from Day 9)

X ,Y Independent Implies Es (X ) t (Y ) =Es (X )Et (Y )


Day 21-Laws of Expectation and Variance

VarX =EX 2 − (EX )2 (From Day 8)

Var (aX + b) = a2 VarX (From Day 9)

Var (aX + bY ) = a2 VarX + b2 VarY + 2abCov(X ,Y )

X ,Y Independent Implies Var (aX + bY ) = a2 VarX + b2 VarY

For Independent X ,Y , . . . ,Z , U = a0 + a1X + a2Y + · · ·+ anZ Has

EU = a0 + a1EX + a2EY + · · ·+ anEZVarU = a21 VarX + a22 VarY + · · ·+ a2n VarZ


Day 22-Propagation of Error/Transition to Statistics

For Independent X ,Y , . . . ,Z Approximations for the Mean andVariance for U = g (X ,Y , . . . ,Z ) Are

Eg (X ,Y , . . . ,Z ) ≈ g (µX , µY , . . . , µZ )

Var g (X ,Y , . . . ,Z ) ≈(

∂g∂x

)2VarX + · · ·+

(∂g∂z

)2VarZ

(Where the Partials Are Evaluated at µX , µY , . . . , µZ )

Random Sampling From a Large Population or a Physically StableProcess is (at Least Approximately) Described by a Model That SaysData X1,X2, . . . ,Xn Are Independent Identically Distributed RandomVariables (With Marginal Probability Distribution the PopulationRelative Frequency Distribution)


Day 23-Distributional Properties of the Sample Mean

If X1,X2, . . . ,Xn Are Independent Identically Distributed (Each WithMean µ and Variance σ2) the Random Variable

X =1n(X1 + X2 + · · ·+ Xn)

Has

EX = µX = µ

VarX = σ2X =σ2

n

Further, X is Approximately Normal if

1 The Population Distribution is Itself Normal2 The Sample Size, n, is Large (The Central Limit Theorem)


Day 24-Introduction to Confidence Intervals

Following From

Z =X − µ

σ/√nis (at least approximately) Standard Normal

The Interval Formula (X − z σ√

n,X + z

σ√n

)Will Cover µ In a Fraction P (−z < Z < z) of All ApplicationsThe End Points

X ± z σ√n

Are Thus (Typically Practically Unusable) Confidence Limits for µ


Warning About Convention

Henceforth Drop the Convention That Random Variables Are RepresentedBy Capital Letters and Their Possible Values by Lower Case Letters.Typically (But Not Always) Lower Case Will Be Used For Both, andContext Will Have to Be Used to Distinguish.


Day 25-Large Sample Confidence Intervals for Means andProportions

Large n Confidence Limits for µ

x ± z s√n

Follow From

Z =x − µ

s/√n∼ (at least approximately) Standard Normal

For Large n, Confidence Limits for p

p ± z√p (1− p)

n(Where p = (np + 2) / (n+ 4)) Follow From

Z =p − p√p (1− p)

n

∼ (at least approximately) Standard Normal


Day 26-Small Sample Confidence Intervals for a (Normal)Mean

Small n Confidence Limits for µ (When Sampling From a NormalDistribution)

x ± t s√n

(For t a Percentage Point of the tn−1 Distribution) Follow From

T =x − µ

s/√n∼ tn−1


Day 27-Small Sample Confidence Intervals for a (Normal)Standard Deviation and Normal Prediction Limits

Confidence Limits for σ (For a Normal Distribution)

s

√n− 1χ2upper

and s

√n− 1χ2lower

Follow From

X 2 =(n− 1) s2

σ2∼ χ2n−1

Prediction Limits for xnew (From a Normal Distribution)

x ± ts√1+

1n

(For t a Percentage Point of the tn−1 Distribution) Follow From

T =x − xnew

s

√1+

1n

∼ tn−1


Day 28-Normal Prediction and Tolerance Limits/NormalPlotting

"Tolerance" Limits for a Large (User Chosen) Part of a NormalDistribution

x ± τ2s (two-sided)

x − τ1s or x + τ1s (one-sided)

Where τ2 or τ1 is Chosen For Given "Part of the Distribution" andConfidence LevelNormal Plots For an Ordered Data Set x1 ≤ x2 ≤ · · · ≤ xn MadePlotting n Points((

i − .5n

)data quantile,

(i − .5n

)standard normal quantile

)=

(xi ,Φ−1

(i − .5n

))= (xi , zi )


Day 29-Hypothesis Testing Introduction 1

Devore 7-Step Format

Null and Alternative Hypotheses

Test Statistic

Type 1 and Type 2 Errors


Day 30-Hypothesis Testing Introduction 2

Test Criteria/Rejection Criteria and Corresponding Error Probabilities

α, β and Their Competing Demands

Hypothesis Testing/Criminal Trial Analogy


Day 31-One-Sample Testing for a Mean

Large n Testing of H0:µ = # Uses

Z =x −#s/√n

and a Standard Normal Reference Distribution

(Normal Distribution) Small n Testing of H0:µ = # Uses

T =x −#s/√n

and a tn−1Reference Distribution


Day 32-One Sample Testing for a Proportion/"p-values"

Large n Testing of H0:p = # Uses

Z =p −#√# (1−#)

n

and a Standard Normal Reference Distribution

In ANY Hypothesis Testing Context

p-value = "observed level of significance"

= the probability (computed under H0)

of seeing a value of the test statistic

"more extreme" than the one observed


Day 33-One Sample Testing for a (Normal) StandardDeviation/(Large) Two-Sample Inference for Means

(Normal Distribution) Testing H0:σ2 = # Uses

X 2 =(n− 1) s2

#

and a χ2n−1 Reference DistributionLarge n1 and n2 (Independent Samples) Confidence Limits forµ1 − µ2 are

x1 − x2 ± z

√s21n1+s22n2

and a Test Statistic (for H0:µ1 − µ2 = #) is

Z =x1 − x2 −#√s21n1+s22n2

With Standard Normal Reference DistributionVardeman (ISU Stat and IMSE) Stat 231 Summary September 6, 2011 35 / 58

Day 34-(Small) Two-Sample Inference for (Normal) Means

(Somewhat Approximate) Small n1 or n2 (Normal Distribution)(Independent Samples) Confidence Limits for µ1 − µ2 are

x1 − x2 ± t

√s21n1+s22n2

(for t a Percentage Point of the t Distribution Withd .f . = min (n1 − 1, n2 − 1)) and a Test Statistic (forH0:µ1 − µ2 = #) is

T =x1 − x2 −#√s21n1+s22n2

With t Reference Distribution With d .f . = min (n1 − 1, n2 − 1)


Day 35-Inference for a Mean Difference/a Difference inProportions

When Paired Values (x , y) Can Sensibly be Reduced to Differences

d = x − yand n of These to d and sd , One-Sample Inference Formulas Apply toInference for µd .Large n1, n2 (Independent Samples) Confidence Limits for p1 − p2 are

p1 − p2 ± z

√p1 (1− p1)

n1+p2 (1− p2)

n2

(Where p1 = (n1p1 + 2) / (n1 + 4) and p2 = (n2p2 + 2) / (n2 + 4) .)A Test Statistic for H0:p1 − p2 = 0 is

Z =p1 − p2√

p (1− p)√1n1+1n2

With a Standard Normal Reference Distribution.Vardeman (ISU Stat and IMSE) Stat 231 Summary September 6, 2011 37 / 58

Day 36-Inference for a Ratio of (Normal) Variances

Where s1 and s2 are Based on Independent Samples from NormalDistributions With Respective Standard Deviations σ1 and σ2,

F =s21/σ21s22/σ22

Has the (Snedecor) F Distribution With n1 − 1 and n2 − 1 Degrees ofFreedom.Hence, Confidence Limits for σ1/σ2 are

s1s2√Fupper

ands1

s2√Flower

and H0:σ1 = σ2 is Tested Using

F =s21s22

and an F Reference Distribution With n1 − 1 and n2 − 1 Degrees ofFreedom.


Day 37-Least Squares Fitting of a Line

Based on n Data Pairs (x1, y1) , . . . , (xn, yn)The "Least Squares Line" Through the Scatterplot Has

slope b1 =∑ni=1(xi − x)(yi − y)

∑ni=1(xi − x)2

=∑ni=1 xiyi −

(∑ni=1 xi ) (∑

ni=1 yi )

n

∑ni=1 x

2i −

(∑ni=1 xi )

2

nintercept b0 = y −b1xThe Sample Correlation Between x and y is

r =∑ni=1(xi − x)(yi − y)√

∑ni=1(xi − x)2 ·∑n

i=1(yi − y)2

=∑ni=1 xiyi −

(∑ni=1 xi ) (∑

ni=1 yi )

n√√√√(∑ni=1 x

2i −

(∑ni=1 xi )

2

n

)(∑ni=1 y

2i −

(∑ni=1 yi )

2

n

)Vardeman (ISU Stat and IMSE) Stat 231 Summary September 6, 2011 39 / 58

Day 38-Coeffi cient of Determination and the SLR Model

Based on n Data Pairs (x1, y1) , . . . , (xn, yn) and Least Squares FittedValues yi = b0 + b1xi

SSTot = (n− 1) s2y =n

∑i=1(yi − y)2, SSE =

n

∑i=1(yi − yi )2

and SSR = SSTot − SSE

Then

R2 =SSRSSTot

= (sample correlation of y and y)2 ( = r2 in SLR only)

The (Normal) Simple Linear Regression Model is

yi = β0 + β1xi + εi for independent N(0, σ2

)random "errors" εi

(Responses are Independent Normal Variables with Meansµy |x = β0 + β1x and Constant Standard Deviation, σ)


Day 39-Inference Under the SLR Model 1

Single-Number Estimates of SLR Model Parameters are

σ = s ≡√SSEn− 2 , β1 = b1, and β0 = b0

Using (n− 2) s2/σ2 ∼ χ2n−2, Confidence Limits for σ are

s

√n− 2χ2Upper

and s

√n− 2χ2Lower

Since b1 is Normal With Mean β1 and StdDev σ/√

∑ni=1(xi − x)2,

Write SEb1 = s/√

∑ni=1(xi − x)2 and Have Confidence Limits for β1

b1 ± t · SEb1And Test H0:β1 = # Using a tn−2 Reference Distribution for

T =b1 −#SEb1


Day 40-Inference Under the SLR Model 2

Since y = b0 + b1xnew Has Mean µy |xnew = β0 + β1xnew and StdDev

σ

√1n+

(xnew − x)2

∑ni=1(xi − x)2

, Write SEy = s

√1n +

(xnew−x )2∑ni=1(xi−x )2

Confidence Limits for µy |xnew = β0 + β1xnew are Then

y ± t · SEy

And H0:µy |xnew = # May Be Tested Using a tn−2 ReferenceDistribution for

T =y −#SEy

Prediction Limits for ynew at x = xnew Are

y ± t · s

√1+

1n+

(xnew − x)2

∑ni=1(xi − x)2


Day 41-SLR and "ANOVA"

Breaking Down SSTot Into SSR and SSE Is a Kind of "ANalysis OfVAriance" (of y). Further, an F Test of H0:β1 = 0 Equivalent to aTwo-Sided t Test Can be Based on F = MSR/MSE and an F1,n−2Reference Distribution. Calculations Are Summarized in a Special"ANOVA" Table.

ANOVA Table (for SLR)Source SS df MS FRegression SSR 1 MSR = SSR/1 F = MSR/MSEError SSE n− 2 MSE = SSE/(n− 2)Total SSTot n− 1

The Facts that EMSE = σ2 and EMSR = σ2 + β21 ∑ni=1(xi − x)2

Provide Motivation for Rejecting H0 For Large Observed F .


Day 42-Practical Considerations 1

The Possibility that Neither Interpolation Nor Extrapolation isCompletely Safe Must Be Considered When Using a Fitted Equation.Rational Practice Requires That One Investigate the Plausibility of aRegression Model Before Basing Inferences on It.

In "Single x" Contexts One Should Plot y versus x Looking for a TrendConsistent With the Fitted Model and for Constant Spread AroundThat Trend.In General, "Residuals"

ei = yi − yiShould Be "Patternless" and "Normal-looking."Common Practice is to Normal-Plot and Plot Against All Predictors(and yi and Other Potential Predictors) "Standardized" Residuals

e∗i =eiSEei

= ei

s

√1− 1

n− (xi − x)2

∑ni=1(xi − x)2

in SLR


Day 43-Practical Considerations 2

For c Different (Sets of) "x Conditions" in the Data Let

SScond j = (ncond j − 1) s2cond jAnd Define (A "Pure Error" Sum of Squares)

SSPE =c

∑j=1SScond j

With Degrees of Freedom n− c = ∑cj=1 (ncond j − 1) . Then (a

"Lack of Fit" Sum of Squares) Is

SSLoF = SSE − SSPEWith Degrees of Freedom

d .f .LoF = error d .f .− (n− c)Then H0:the fitted model is appropriate Can Be Tested Using

F =SSLoF/d .f .LoFSSPE/ (n− c)


Day 44-SLR Practical Considerations 3/MLR Model

Sometimes, Replacing y With Some Function of y (Like, e.g.,y ′ = ln y) and/or x’s With Some Function(s) Thereof Can Make TheSimple Technology of Regression Analysis Applicable to the Analysisof a Data Set.The Multiple Linear Regression Model is

yi = β0 + β1x1i + β2x2i + · · ·+ βkxki + εi

Where the εi Are Independent Normal With Mean 0 and StandardDeviation σ.Least Squares (e.g. Implemented in JMP) Can Be Used to Fit

y = b0 + b1x1 + b2x2 + · · ·+ bkxk(That is, Estimate the β’s). The Corresponding Estimate of σ Is

s =

√∑ (yi − yi )2

n− (k + 1)


Day 45-MLR R-Squared/Overall (Model Utility) F Test

In MLR as in SLR,

SSTot = (n− 1) s2y =n

∑i=1(yi − y)2, SSE =

n

∑i=1(yi − yi )2

and SSR = SSTot − SSEand R2 =

SSRSSTot

The Basic ANOVA Table for MLR Is

ANOVA Table (for MLR)Source SS df MS FRegression SSR k MSR = SSR/k F = MSR

MSEError SSE n− k − 1 MSE = SSE/(n− k − 1)Total SSTot n− 1

Which Organizes an F Test of H0:β1 = β2 = · · · = βk = 0


Day 46-MLR Partial F Tests/Fitted Coeffi cients

In the (Full) MLR Model

y = β0 + β1x1 + β2x2 + · · ·+ βkxk + ε

For l < k, The Hypothesis H0:βl+1 = · · · = βk = 0, Is That the FullModel is Not Clearly Better Than the Reduced Model

y = β0 + β1x1 + β2x2 + · · ·+ βlxl + ε

This Can Be Tested Using

F =(SSR (full)− SSR (reduced)) / (k − l)

SSE (full) / (n− k − 1)With an Fk−l ,n−k−1 Reference DistributionThe Fitted Coeffi cient bl Is Normal With Mean βl And StandardDeviation

σ · (a complicated function of the values of the predictors xli )


Day 47-Confidence and Prediction Limits in MLR

The Fitted Value y = b0 + b1x1 + b2x2 + · · ·+ bkxk Is Normal WithMean µy |x1,...,xk = β0 + β1x1 + β2x2 + · · ·+ βkxk And StandardDeviation

σ · (a complicated function of the values of the predictors xli )Replacing σ with s In the Previous Two Standard DeviationsProduces Standard Errors SEbl and SEy That Are Obtained From JMP(NOT "By Hand")Confidence and Prediction Limits Are Then

s

√n− k − 1

χ2upperand s

√n− k − 1

χ2lowerfor σ

bl ± t · SEbl for βly ± t · SEy for µy |x1,...,xk

y ± t ·√s2 + (SEy )

2 for ynew at (x1, . . . , xk )


Day 48-Practical Considerations

Model Checking Involves Residual Plots (Residuals are ei = yi − yiand Standardized Residuals are e∗i = ei/SEei for SEei =σ · (a complicated function of the values of the predictors xli )), Lackof Fit Tests, and Examination of the "PreSS" Statistic. For y(i ) AFitted Value For the ith Case Obtained Not Using the Case in theFitting

PRESS =n

∑i=1(yi − y(i ))2 ≥

n

∑i=1(yi − yi )2 = SSE

(Ideally, PRESS Is Not Much Larger Than SSE )Extrapolation is a Potentially Big Issue in MLRTransformations Extend the Potential Applications of MLRVariable/Model Selection in MLR Involves Balancing "Good Fit"versus a Small Number of Predictors Variables

Formal Tools are Partial F Tests and Tests for Lack of FitExamination of R2, MSE , and Cp For "All Possible Regressions" Is AMore Flexible Informal Approach


Day 49-Practical Considerations-Model Selection

Considering Submodels (Reduced Models)

y = β0 + β1x1 + β2x2 + · · ·+ βpxp + ε

of a (Full) Model

y = β0 + β1x1 + · · ·+ βpxp + βp+1xp+1 + · · ·+ βkxk + ε

Assumed to Produce Correct Values of The Means µyi ,

Cp = (n− k − 1)(SSEpSSEk

)+ 2 (p + 1)− n

(Under the Full Model) Estimates a Quantity that is

p + 1+(a positive measure of how badly thereduced model does at fitting the µyi

)So, Simple (Small p) Models With Big R2, Small MSE , andCp ≈ p + 1 Are Desired


Day 50-Practical Considerations-ModelSelection/Multicollinearity

JMP Fit Model "Stepwise," Lack of Fit, and PRESS

The Complication of Multicollinearity Arises in MLR When One orMore of the Predictors is Nearly a Linear Combination of Others ofthe Predictors (and Is Therefore Essentially Redundant in PracticalTerms). When This Occurs (Besides There Being Technical ProblemsAssociated With Solution of the Least Squares Fitting Problem):

While Good Prediction For Cases Like Those in the Data Set May BePossible, Extrapolation Is Extremely Dangerous, andAssessment of Individual Importance of Particular Predictors Is OftenImpossible. (This Produces Big Standard Errors for IndividualCoeffi cients, and Often Individual bl’s That Make No Sense in theSubject-Matter Application.)

Multicollinearity Can Be Prevented If One Gets to Choose(x1i , x2i , . . . , xki ) Combinations (By Making Predictors Uncorrelated).


Day 51-Multicollinearity

With

e(j)(yi ) = the ith y residual regressing

on all predictor variables except xj and

e(j)(xji ) = the ith xj residual regressing

on all predictor variables except xj

JMP Plots Accompanying "Effect Tests" in Fit Model Are Plots Of

e(j)(yi ) + y versus e(j)(xji ) + x j

So Small Spread In Horizontal Coordinates of Plotted Points IndicatesMulticollinearity.

When Predictors Are Uncorrelated, Regression Sums of Squares"Add" and Fitted Coeffi cients bl Are The Same For All ModelsIncluding xl .


Day 52-Multicollinearity/The One Way Normal Model

Multicollinearity Means That One Only Has (x1i , x2i , . . . , xki )Essentially In Some Lower-Dimensional (Than k) Subspace ofk-Dimensional Space and Thus Can Hope To Reliably Predict OnlyThere.

One-Way Analyses Are "r -Sample" Analyses (Not Unlike the2-Sample Analyses of Devore Ch 9). They Are Based On A ModelFor

yij = jth observation in the ith sample

Of The Formyij = µi + εij

For The εij Independent Normal Random Variables With Mean 0 andStandard Deviation σ. This Is "Samples From r Normal PopulationsWith Possibly Different Means But A Common Standard Deviation."


Day 53-Inference in the One Way Normal Model

A Single Number Estimate of σ Is

spooled = sP =

√(n1 − 1) s21 + (n2 − 1) s22 + · · ·+ (nr − 1) s2r

(n1 − 1) + (n2 − 1) + · · ·+ (nr − 1)And Confidence Limits For σ Are

sP

√n− rχ2upper

and sP

√n− rχ2lower

In The Context of Lack of Fit In Regression, (SSPE/d.f. PE) = s2P.For Population and Corresponding Sample Linear Combinations of rMeans

L = c1µ1 + · · ·+ crµr and L = c1y1 + · · ·+ cr y rConfidence Limits For L Are

L± tsP

√c21n1+ · · ·+ c

2r

nrVardeman (ISU Stat and IMSE) Stat 231 Summary September 6, 2011 55 / 58

Day 54-One Way ANOVA and Model Checking

In The One Way Context

SSE = ∑i ,j(yij − y i )2 = (n1 − 1) s21 + · · ·+ (nr − 1) s2r and

SSTr = SSTot − SSE =r

∑i=1ni (y i − y)2

And The Hypothesis H0:µ1 = µ2 = · · · = µr May Be Tested Using

F =MSTrMSE

=SSTr/ (r − 1)SSE/ (n− r)

And An Fr−1,n−r Reference Distribution.One Way Residuals and Standardized Residuals Are (Respectively)

eij = yij − y i and e∗ij =eij

sP

√ni − 1ni

And Are Used In Model Checking Exactly As In Regression.Vardeman (ISU Stat and IMSE) Stat 231 Summary September 6, 2011 56 / 58

Day 55-Statistics and Measurement 1

Measurand x

Measurement y

Measurement Error ε = y − x (So y = x + ε)

Standard Modeling Is That ε Is Normal With Mean β (MeasurementBias) And Standard Deviation σ

Then For Independent Measurements y1, y2, . . . , yn of a Fixed x

y ± t sy√nEstimates x + β (Measurand Plus Bias)

Limits sy

√n− 1

χ2Upperand sy

√n− 1χ2Lower

Are For σ (If Gauge And Operator

Are Fixed, This Is A Repeatability Std Dev ... If Each yi Is From ADifferent Operator This Is an R&R Std Dev)

If x Varies Independent of ε, The Situation is More Complex andModeling of Multiple Measurement Depends on Exactly How DataAre Taken ... For a Single y , Ey = µx + β And σy =

√σ2x + σ2


Day 56-Statistics and Measurement 2

So if Each of y1, y2, . . . , yn Has a Different Measurandy ± t sy√

nEstimates µx + β (Average Measurand Plus Bias)

Limits sy

√n− 1

χ2Upperand sy

√n− 1χ2Lower

Are For√

σ2x + σ2 (A

Combination of Measurand Variability and Measurement Variability)

Two Important Applications of This Are WhereDifferent x’s Represent The Truth About Different Items, So σxMeasures Process VariabilityDifferent x’s Represent Different Operator-Specific Biases, So σxMeasures Reproducibility Variability

Where a Data Set Has r Measurands and m Measurements PerMeasurand, One Way ANOVA Can Help Separate σ2x And σ2

sP (And Associated Confidence Limits) Estimate σ√max

(0, 1m (MSTr −MSE )

)(And Limits Provided By JMP If You

Know How To Ask) Estimate σxVardeman (ISU Stat and IMSE) Stat 231 Summary September 6, 2011 58 / 58

stat 231 list of concepts and formulas -...

Documents