a short course in linear and logistic regression...linear regression as descriptive measure...

164
A Short Course in Linear and Logistic Regression Suman Guha Assistant Professor Department of Statistics Presidency University, Kolkata July 26, 2020 1 / 29

Upload: others

Post on 03-Aug-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

A Short Course in Linear and Logistic Regression

Suman GuhaAssistant Professor

Department of StatisticsPresidency University, Kolkata

July 26, 2020

1 / 29

Page 2: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression as Descriptive Measure

Observations taken on two features - say height (x) and weight (y) ofindividuals.

Situations when (x) and (y) show no interrelationships - no point doingregression.

-2 -1 0 1 2

-3-2

-10

12

3

x

y

(a)

Figure: Artificially simulated dataset showing no dependence between x and y .

2 / 29

Page 3: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression as Descriptive Measure

Observations taken on two features - say height (x) and weight (y) ofindividuals.

Situations when (x) and (y) show no interrelationships - no point doingregression.

-2 -1 0 1 2

-3-2

-10

12

3

x

y

(a)

Figure: Artificially simulated dataset showing no dependence between x and y .

2 / 29

Page 4: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression as Descriptive Measure

Fortunately, most of the time x and y turns out to be dependent!

5 10 15 20 25

0

20

40

60

80

100

120

Speed (mph)

Sto

ppin

g di

stan

ce (f

t)

cars data

(a)

Figure: (x) car speed in miles per hour vs (y) stopping distance in feet.

Want an approximate formula (y ≈ f (x)) of stopping distance (y) in terms ofcar speed (x) - regression problem.

Why?

3 / 29

Page 5: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression as Descriptive Measure

Fortunately, most of the time x and y turns out to be dependent!

5 10 15 20 25

0

20

40

60

80

100

120

Speed (mph)

Sto

ppin

g di

stan

ce (f

t)

cars data

(a)

Figure: (x) car speed in miles per hour vs (y) stopping distance in feet.

Want an approximate formula (y ≈ f (x)) of stopping distance (y) in terms ofcar speed (x) - regression problem.

Why?

3 / 29

Page 6: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression as Descriptive Measure

Fortunately, most of the time x and y turns out to be dependent!

5 10 15 20 25

0

20

40

60

80

100

120

Speed (mph)

Sto

ppin

g di

stan

ce (f

t)

cars data

(a)

Figure: (x) car speed in miles per hour vs (y) stopping distance in feet.

Want an approximate formula (y ≈ f (x)) of stopping distance (y) in terms ofcar speed (x) - regression problem.

Why?

3 / 29

Page 7: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression as Descriptive Measure

To understand the nature of dependence between (x) and (y).

Sometimes (y) may be costly/difficult to measure (total annual income) but (x)may be measured easily (total annual expenditure) - can use the formula topredict y∗ using x∗.

What type of formula? - f (x) = ax3 + b√

x + c?

No, we want a formula of form f (x) = a + bx - equation of a straight line.

Reason?(i) Mathematically simple.

(ii) Most of the time linear regression perform quite well!

How to get the value of a, b? - a line that pass through the most middle -obtained by minimizing

∑ni=1(yi − a− bxi )

2.

Closed form solution available -f (x) = (y − Cov(x,y)

Var(x)x) + Cov(x,y)

Var(x)x = y + Cov(x,y)

Var(x)(x − x).

4 / 29

Page 8: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression as Descriptive Measure

To understand the nature of dependence between (x) and (y).

Sometimes (y) may be costly/difficult to measure (total annual income) but (x)may be measured easily (total annual expenditure) - can use the formula topredict y∗ using x∗.

What type of formula? - f (x) = ax3 + b√

x + c?

No, we want a formula of form f (x) = a + bx - equation of a straight line.

Reason?(i) Mathematically simple.

(ii) Most of the time linear regression perform quite well!

How to get the value of a, b? - a line that pass through the most middle -obtained by minimizing

∑ni=1(yi − a− bxi )

2.

Closed form solution available -f (x) = (y − Cov(x,y)

Var(x)x) + Cov(x,y)

Var(x)x = y + Cov(x,y)

Var(x)(x − x).

4 / 29

Page 9: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression as Descriptive Measure

To understand the nature of dependence between (x) and (y).

Sometimes (y) may be costly/difficult to measure (total annual income) but (x)may be measured easily (total annual expenditure) - can use the formula topredict y∗ using x∗.

What type of formula? - f (x) = ax3 + b√

x + c?

No, we want a formula of form f (x) = a + bx - equation of a straight line.

Reason?(i) Mathematically simple.

(ii) Most of the time linear regression perform quite well!

How to get the value of a, b? - a line that pass through the most middle -obtained by minimizing

∑ni=1(yi − a− bxi )

2.

Closed form solution available -f (x) = (y − Cov(x,y)

Var(x)x) + Cov(x,y)

Var(x)x = y + Cov(x,y)

Var(x)(x − x).

4 / 29

Page 10: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression as Descriptive Measure

To understand the nature of dependence between (x) and (y).

Sometimes (y) may be costly/difficult to measure (total annual income) but (x)may be measured easily (total annual expenditure) - can use the formula topredict y∗ using x∗.

What type of formula? - f (x) = ax3 + b√

x + c?

No, we want a formula of form f (x) = a + bx - equation of a straight line.

Reason?(i) Mathematically simple.

(ii) Most of the time linear regression perform quite well!

How to get the value of a, b? - a line that pass through the most middle -obtained by minimizing

∑ni=1(yi − a− bxi )

2.

Closed form solution available -f (x) = (y − Cov(x,y)

Var(x)x) + Cov(x,y)

Var(x)x = y + Cov(x,y)

Var(x)(x − x).

4 / 29

Page 11: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression as Descriptive Measure

To understand the nature of dependence between (x) and (y).

Sometimes (y) may be costly/difficult to measure (total annual income) but (x)may be measured easily (total annual expenditure) - can use the formula topredict y∗ using x∗.

What type of formula? - f (x) = ax3 + b√

x + c?

No, we want a formula of form f (x) = a + bx - equation of a straight line.

Reason?(i) Mathematically simple.

(ii) Most of the time linear regression perform quite well!

How to get the value of a, b? - a line that pass through the most middle -obtained by minimizing

∑ni=1(yi − a− bxi )

2.

Closed form solution available -f (x) = (y − Cov(x,y)

Var(x)x) + Cov(x,y)

Var(x)x = y + Cov(x,y)

Var(x)(x − x).

4 / 29

Page 12: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression as Descriptive Measure

To understand the nature of dependence between (x) and (y).

Sometimes (y) may be costly/difficult to measure (total annual income) but (x)may be measured easily (total annual expenditure) - can use the formula topredict y∗ using x∗.

What type of formula? - f (x) = ax3 + b√

x + c?

No, we want a formula of form f (x) = a + bx - equation of a straight line.

Reason?(i) Mathematically simple.

(ii) Most of the time linear regression perform quite well!

How to get the value of a, b? - a line that pass through the most middle -obtained by minimizing

∑ni=1(yi − a− bxi )

2.

Closed form solution available -f (x) = (y − Cov(x,y)

Var(x)x) + Cov(x,y)

Var(x)x = y + Cov(x,y)

Var(x)(x − x).

4 / 29

Page 13: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression as Descriptive Measure

To understand the nature of dependence between (x) and (y).

Sometimes (y) may be costly/difficult to measure (total annual income) but (x)may be measured easily (total annual expenditure) - can use the formula topredict y∗ using x∗.

What type of formula? - f (x) = ax3 + b√

x + c?

No, we want a formula of form f (x) = a + bx - equation of a straight line.

Reason?(i) Mathematically simple.

(ii) Most of the time linear regression perform quite well!

How to get the value of a, b? - a line that pass through the most middle -obtained by minimizing

∑ni=1(yi − a− bxi )

2.

Closed form solution available -f (x) = (y − Cov(x,y)

Var(x)x) + Cov(x,y)

Var(x)x = y + Cov(x,y)

Var(x)(x − x).

4 / 29

Page 14: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression as Descriptive Measure

xi , yi - given data. Yi = f (xi ) is fitted values and ei = yi − Yi - residuals.

5 10 15 20 25

0

20

40

60

80

100

120

Speed (mph)

Sto

ppin

g di

stan

ce (f

t)

cars data

(a)

Figure: Scatter plot with the regression line, fitted values and residuals.

Minimizing∑n

i=1(yi − a− bxi )2 wrt a, b - Principle of least squares (LS) - LS

regression line.

LS regression line is highly vulnerable to outlying observation.

5 / 29

Page 15: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression as Descriptive Measure

xi , yi - given data. Yi = f (xi ) is fitted values and ei = yi − Yi - residuals.

5 10 15 20 25

0

20

40

60

80

100

120

Speed (mph)

Sto

ppin

g di

stan

ce (f

t)

cars data

(a)

Figure: Scatter plot with the regression line, fitted values and residuals.

Minimizing∑n

i=1(yi − a− bxi )2 wrt a, b - Principle of least squares (LS) - LS

regression line.

LS regression line is highly vulnerable to outlying observation.

5 / 29

Page 16: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression as Descriptive Measure

xi , yi - given data. Yi = f (xi ) is fitted values and ei = yi − Yi - residuals.

5 10 15 20 25

0

20

40

60

80

100

120

Speed (mph)

Sto

ppin

g di

stan

ce (f

t)

cars data

(a)

Figure: Scatter plot with the regression line, fitted values and residuals.

Minimizing∑n

i=1(yi − a− bxi )2 wrt a, b - Principle of least squares (LS) - LS

regression line.

LS regression line is highly vulnerable to outlying observation.

5 / 29

Page 17: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression as Descriptive Measure

0 10 20 30 40 50

020

4060

8010

012

0

cars$speed

cars

$dis

t

cars data with artificial outlier

(a)

Figure: Effect of a single outlier on LS regression line.

Two possibilities : (i) detect and drop the outlier (ii) apply an outliers resistantregression.

Minimizing∑n

i=1(yi − a− bxi )2 wrt a, b equivalent minimizing

1n∑n

i=1(yi − a− bxi )2 (mean of (yi − a− bxi )

2) wrt a, b.

Why not minimize Median of (yi − a− bxi )2 wrt a, b? - least median square

(LMS) regression.6 / 29

Page 18: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression as Descriptive Measure

0 10 20 30 40 50

020

4060

8010

012

0

cars$speed

cars

$dis

t

cars data with artificial outlier

(a)

Figure: Effect of a single outlier on LS regression line.

Two possibilities : (i) detect and drop the outlier (ii) apply an outliers resistantregression.

Minimizing∑n

i=1(yi − a− bxi )2 wrt a, b equivalent minimizing

1n∑n

i=1(yi − a− bxi )2 (mean of (yi − a− bxi )

2) wrt a, b.

Why not minimize Median of (yi − a− bxi )2 wrt a, b? - least median square

(LMS) regression.6 / 29

Page 19: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression as Descriptive Measure

0 10 20 30 40 50

020

4060

8010

012

0

cars$speed

cars

$dis

t

cars data with artificial outlier

(a)

Figure: Effect of a single outlier on LS regression line.

Two possibilities : (i) detect and drop the outlier (ii) apply an outliers resistantregression.

Minimizing∑n

i=1(yi − a− bxi )2 wrt a, b equivalent minimizing

1n∑n

i=1(yi − a− bxi )2 (mean of (yi − a− bxi )

2) wrt a, b.

Why not minimize Median of (yi − a− bxi )2 wrt a, b? - least median square

(LMS) regression.6 / 29

Page 20: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression as Descriptive Measure

0 10 20 30 40 50

020

4060

8010

012

0

cars$speed

cars

$dis

t

cars data with artificial outlier

(a)

Figure: Effect of a single outlier on LS regression line.

Two possibilities : (i) detect and drop the outlier (ii) apply an outliers resistantregression.

Minimizing∑n

i=1(yi − a− bxi )2 wrt a, b equivalent minimizing

1n∑n

i=1(yi − a− bxi )2 (mean of (yi − a− bxi )

2) wrt a, b.

Why not minimize Median of (yi − a− bxi )2 wrt a, b? - least median square

(LMS) regression.6 / 29

Page 21: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression as Descriptive Measure

LMS regression line is less affected by outliers - outliers resistant.

0 10 20 30 40 50

020

4060

8010

012

0

cars$speed

cars

$dis

t

LMS regression with artificial outlier

(a)

Figure: Effect of outlier on LMS regression line.

So far only descriptive statistics.

Want to understand reliability/accuracy of this regression lines - requirespecifying suitable statistical model for the data.

7 / 29

Page 22: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression as Descriptive Measure

LMS regression line is less affected by outliers - outliers resistant.

0 10 20 30 40 50

020

4060

8010

012

0

cars$speed

cars

$dis

t

LMS regression with artificial outlier

(a)

Figure: Effect of outlier on LMS regression line.

So far only descriptive statistics.

Want to understand reliability/accuracy of this regression lines - requirespecifying suitable statistical model for the data.

7 / 29

Page 23: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression as Descriptive Measure

LMS regression line is less affected by outliers - outliers resistant.

0 10 20 30 40 50

020

4060

8010

012

0

cars$speed

cars

$dis

t

LMS regression with artificial outlier

(a)

Figure: Effect of outlier on LMS regression line.

So far only descriptive statistics.

Want to understand reliability/accuracy of this regression lines - requirespecifying suitable statistical model for the data.

7 / 29

Page 24: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Simple linear regression model :

[Y1 = y1, · · · ,Yn = yn|X1 = x1, · · · ,Xn = xn] ∼ ( 1√2πσε

)ne− 1

2∑n

i=1(yi−a−bxi )2

σ2ε

Model parameters - a, b, σε.

The model looks unfamiliar?

The model is nothing but a family of MVN distributions indexed by unknownparameters a, b, σε.

More familiar specification - Y = Xβ + ε; ε ∼ MVN(0, σ2ε In).

Y =

y1y2...

yn

, X =

1 x11 x2...1 xn

and ε =

ε1ε2...εn

are unobserved random errors.

β =

(β0β1

)=

(ab

).

8 / 29

Page 25: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Simple linear regression model :

[Y1 = y1, · · · ,Yn = yn|X1 = x1, · · · ,Xn = xn] ∼ ( 1√2πσε

)ne− 1

2∑n

i=1(yi−a−bxi )2

σ2ε

Model parameters - a, b, σε.

The model looks unfamiliar?

The model is nothing but a family of MVN distributions indexed by unknownparameters a, b, σε.

More familiar specification - Y = Xβ + ε; ε ∼ MVN(0, σ2ε In).

Y =

y1y2...

yn

, X =

1 x11 x2...1 xn

and ε =

ε1ε2...εn

are unobserved random errors.

β =

(β0β1

)=

(ab

).

8 / 29

Page 26: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Simple linear regression model :

[Y1 = y1, · · · ,Yn = yn|X1 = x1, · · · ,Xn = xn] ∼ ( 1√2πσε

)ne− 1

2∑n

i=1(yi−a−bxi )2

σ2ε

Model parameters - a, b, σε.

The model looks unfamiliar?

The model is nothing but a family of MVN distributions indexed by unknownparameters a, b, σε.

More familiar specification - Y = Xβ + ε; ε ∼ MVN(0, σ2ε In).

Y =

y1y2...

yn

, X =

1 x11 x2...1 xn

and ε =

ε1ε2...εn

are unobserved random errors.

β =

(β0β1

)=

(ab

).

8 / 29

Page 27: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Simple linear regression model :

[Y1 = y1, · · · ,Yn = yn|X1 = x1, · · · ,Xn = xn] ∼ ( 1√2πσε

)ne− 1

2∑n

i=1(yi−a−bxi )2

σ2ε

Model parameters - a, b, σε.

The model looks unfamiliar?

The model is nothing but a family of MVN distributions indexed by unknownparameters a, b, σε.

More familiar specification - Y = Xβ + ε; ε ∼ MVN(0, σ2ε In).

Y =

y1y2...

yn

, X =

1 x11 x2...1 xn

and ε =

ε1ε2...εn

are unobserved random errors.

β =

(β0β1

)=

(ab

).

8 / 29

Page 28: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Simple linear regression model :

[Y1 = y1, · · · ,Yn = yn|X1 = x1, · · · ,Xn = xn] ∼ ( 1√2πσε

)ne− 1

2∑n

i=1(yi−a−bxi )2

σ2ε

Model parameters - a, b, σε.

The model looks unfamiliar?

The model is nothing but a family of MVN distributions indexed by unknownparameters a, b, σε.

More familiar specification - Y = Xβ + ε; ε ∼ MVN(0, σ2ε In).

Y =

y1y2...

yn

, X =

1 x11 x2...1 xn

and ε =

ε1ε2...εn

are unobserved random errors.

β =

(β0β1

)=

(ab

).

8 / 29

Page 29: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Simple linear regression model :

[Y1 = y1, · · · ,Yn = yn|X1 = x1, · · · ,Xn = xn] ∼ ( 1√2πσε

)ne− 1

2∑n

i=1(yi−a−bxi )2

σ2ε

Model parameters - a, b, σε.

The model looks unfamiliar?

The model is nothing but a family of MVN distributions indexed by unknownparameters a, b, σε.

More familiar specification - Y = Xβ + ε; ε ∼ MVN(0, σ2ε In).

Y =

y1y2...

yn

, X =

1 x11 x2...1 xn

and ε =

ε1ε2...εn

are unobserved random errors.

β =

(β0β1

)=

(ab

).

8 / 29

Page 30: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

The model is fitted using maximum likelihood method.

Inferential goal - estimating β and σ2ε .

mle of β is given by β = QXy = (X′X)−1X′y - same as LS regression values.

mle of σ2ε is given by σ2

ε =∑n

i=1 e2i

n - biased.

An unbiased estimator σ2ε =

∑ni=1 e2

in−2 .

Only concentrate on β from now on.

How good/reliable are these estimates? - calculate standard errors.

Var(β) = Var(QXy) = QXVar(y)Q′X = QXσ2ε InQ′X = σ2

εQXQ′X =

σ2ε(X′X)−1X′X(X′X)−1 = σ2

ε(X′X)−1.

Estimate of Var(β) is σ2ε(X′X)−1 (we use the unbiased estimator σ2

ε not mleσ2ε .

Its diagonal entries - estimate of standard error se(β0) and se(β1).

9 / 29

Page 31: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

The model is fitted using maximum likelihood method.

Inferential goal - estimating β and σ2ε .

mle of β is given by β = QXy = (X′X)−1X′y - same as LS regression values.

mle of σ2ε is given by σ2

ε =∑n

i=1 e2i

n - biased.

An unbiased estimator σ2ε =

∑ni=1 e2

in−2 .

Only concentrate on β from now on.

How good/reliable are these estimates? - calculate standard errors.

Var(β) = Var(QXy) = QXVar(y)Q′X = QXσ2ε InQ′X = σ2

εQXQ′X =

σ2ε(X′X)−1X′X(X′X)−1 = σ2

ε(X′X)−1.

Estimate of Var(β) is σ2ε(X′X)−1 (we use the unbiased estimator σ2

ε not mleσ2ε .

Its diagonal entries - estimate of standard error se(β0) and se(β1).

9 / 29

Page 32: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

The model is fitted using maximum likelihood method.

Inferential goal - estimating β and σ2ε .

mle of β is given by β = QXy = (X′X)−1X′y - same as LS regression values.

mle of σ2ε is given by σ2

ε =∑n

i=1 e2i

n - biased.

An unbiased estimator σ2ε =

∑ni=1 e2

in−2 .

Only concentrate on β from now on.

How good/reliable are these estimates? - calculate standard errors.

Var(β) = Var(QXy) = QXVar(y)Q′X = QXσ2ε InQ′X = σ2

εQXQ′X =

σ2ε(X′X)−1X′X(X′X)−1 = σ2

ε(X′X)−1.

Estimate of Var(β) is σ2ε(X′X)−1 (we use the unbiased estimator σ2

ε not mleσ2ε .

Its diagonal entries - estimate of standard error se(β0) and se(β1).

9 / 29

Page 33: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

The model is fitted using maximum likelihood method.

Inferential goal - estimating β and σ2ε .

mle of β is given by β = QXy = (X′X)−1X′y - same as LS regression values.

mle of σ2ε is given by σ2

ε =∑n

i=1 e2i

n - biased.

An unbiased estimator σ2ε =

∑ni=1 e2

in−2 .

Only concentrate on β from now on.

How good/reliable are these estimates? - calculate standard errors.

Var(β) = Var(QXy) = QXVar(y)Q′X = QXσ2ε InQ′X = σ2

εQXQ′X =

σ2ε(X′X)−1X′X(X′X)−1 = σ2

ε(X′X)−1.

Estimate of Var(β) is σ2ε(X′X)−1 (we use the unbiased estimator σ2

ε not mleσ2ε .

Its diagonal entries - estimate of standard error se(β0) and se(β1).

9 / 29

Page 34: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

The model is fitted using maximum likelihood method.

Inferential goal - estimating β and σ2ε .

mle of β is given by β = QXy = (X′X)−1X′y - same as LS regression values.

mle of σ2ε is given by σ2

ε =∑n

i=1 e2i

n - biased.

An unbiased estimator σ2ε =

∑ni=1 e2

in−2 .

Only concentrate on β from now on.

How good/reliable are these estimates? - calculate standard errors.

Var(β) = Var(QXy) = QXVar(y)Q′X = QXσ2ε InQ′X = σ2

εQXQ′X =

σ2ε(X′X)−1X′X(X′X)−1 = σ2

ε(X′X)−1.

Estimate of Var(β) is σ2ε(X′X)−1 (we use the unbiased estimator σ2

ε not mleσ2ε .

Its diagonal entries - estimate of standard error se(β0) and se(β1).

9 / 29

Page 35: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

The model is fitted using maximum likelihood method.

Inferential goal - estimating β and σ2ε .

mle of β is given by β = QXy = (X′X)−1X′y - same as LS regression values.

mle of σ2ε is given by σ2

ε =∑n

i=1 e2i

n - biased.

An unbiased estimator σ2ε =

∑ni=1 e2

in−2 .

Only concentrate on β from now on.

How good/reliable are these estimates? - calculate standard errors.

Var(β) = Var(QXy) = QXVar(y)Q′X = QXσ2ε InQ′X = σ2

εQXQ′X =

σ2ε(X′X)−1X′X(X′X)−1 = σ2

ε(X′X)−1.

Estimate of Var(β) is σ2ε(X′X)−1 (we use the unbiased estimator σ2

ε not mleσ2ε .

Its diagonal entries - estimate of standard error se(β0) and se(β1).

9 / 29

Page 36: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

The model is fitted using maximum likelihood method.

Inferential goal - estimating β and σ2ε .

mle of β is given by β = QXy = (X′X)−1X′y - same as LS regression values.

mle of σ2ε is given by σ2

ε =∑n

i=1 e2i

n - biased.

An unbiased estimator σ2ε =

∑ni=1 e2

in−2 .

Only concentrate on β from now on.

How good/reliable are these estimates? - calculate standard errors.

Var(β) = Var(QXy) = QXVar(y)Q′X = QXσ2ε InQ′X = σ2

εQXQ′X =

σ2ε(X′X)−1X′X(X′X)−1 = σ2

ε(X′X)−1.

Estimate of Var(β) is σ2ε(X′X)−1 (we use the unbiased estimator σ2

ε not mleσ2ε .

Its diagonal entries - estimate of standard error se(β0) and se(β1).

9 / 29

Page 37: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

The model is fitted using maximum likelihood method.

Inferential goal - estimating β and σ2ε .

mle of β is given by β = QXy = (X′X)−1X′y - same as LS regression values.

mle of σ2ε is given by σ2

ε =∑n

i=1 e2i

n - biased.

An unbiased estimator σ2ε =

∑ni=1 e2

in−2 .

Only concentrate on β from now on.

How good/reliable are these estimates? - calculate standard errors.

Var(β) = Var(QXy) = QXVar(y)Q′X = QXσ2ε InQ′X = σ2

εQXQ′X =

σ2ε(X′X)−1X′X(X′X)−1 = σ2

ε(X′X)−1.

Estimate of Var(β) is σ2ε(X′X)−1 (we use the unbiased estimator σ2

ε not mleσ2ε .

Its diagonal entries - estimate of standard error se(β0) and se(β1).

9 / 29

Page 38: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

The model is fitted using maximum likelihood method.

Inferential goal - estimating β and σ2ε .

mle of β is given by β = QXy = (X′X)−1X′y - same as LS regression values.

mle of σ2ε is given by σ2

ε =∑n

i=1 e2i

n - biased.

An unbiased estimator σ2ε =

∑ni=1 e2

in−2 .

Only concentrate on β from now on.

How good/reliable are these estimates? - calculate standard errors.

Var(β) = Var(QXy) = QXVar(y)Q′X = QXσ2ε InQ′X = σ2

εQXQ′X =

σ2ε(X′X)−1X′X(X′X)−1 = σ2

ε(X′X)−1.

Estimate of Var(β) is σ2ε(X′X)−1 (we use the unbiased estimator σ2

ε not mleσ2ε .

Its diagonal entries - estimate of standard error se(β0) and se(β1).

9 / 29

Page 39: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

The model is fitted using maximum likelihood method.

Inferential goal - estimating β and σ2ε .

mle of β is given by β = QXy = (X′X)−1X′y - same as LS regression values.

mle of σ2ε is given by σ2

ε =∑n

i=1 e2i

n - biased.

An unbiased estimator σ2ε =

∑ni=1 e2

in−2 .

Only concentrate on β from now on.

How good/reliable are these estimates? - calculate standard errors.

Var(β) = Var(QXy) = QXVar(y)Q′X = QXσ2ε InQ′X = σ2

εQXQ′X =

σ2ε(X′X)−1X′X(X′X)−1 = σ2

ε(X′X)−1.

Estimate of Var(β) is σ2ε(X′X)−1 (we use the unbiased estimator σ2

ε not mleσ2ε .

Its diagonal entries - estimate of standard error se(β0) and se(β1).

9 / 29

Page 40: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Another inferential goal - testing for β.

Individual test of significance H0 : β0 = 0 vs H1 : β0 6= 0 (test of intercept).

Test statistic T =β0

se(β0).

Null distribution of test statistic ∼ tn−2 - Cutoff is obtained usingtn−2-distribution table.

Practitioners prefer p-value - P(T > |Tobserved |) where T ∼ tn−2.

Individual test of significance H0 : β1 = 0 vs H1 : β1 6= 0 (test of slope).

Test statistic T = β1

se(β1).

Null distribution of test statistic ∼ tn−2 - Cutoff is obtained usingtn−2-distribution table.

Joint test of significance H0 : β = 0 vs H1 : β 6= 0

Test statistic F = β′(X ′X)β

2σ2ε

.

Null distribution of test statistic ∼ F2,n−2 - Cutoff is obtained usingF2,n−2-distribution table.

p-value - P(F > Fobserved ) where F ∼ F2,n−2.

10 / 29

Page 41: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Another inferential goal - testing for β.

Individual test of significance H0 : β0 = 0 vs H1 : β0 6= 0 (test of intercept).

Test statistic T =β0

se(β0).

Null distribution of test statistic ∼ tn−2 - Cutoff is obtained usingtn−2-distribution table.

Practitioners prefer p-value - P(T > |Tobserved |) where T ∼ tn−2.

Individual test of significance H0 : β1 = 0 vs H1 : β1 6= 0 (test of slope).

Test statistic T = β1

se(β1).

Null distribution of test statistic ∼ tn−2 - Cutoff is obtained usingtn−2-distribution table.

Joint test of significance H0 : β = 0 vs H1 : β 6= 0

Test statistic F = β′(X ′X)β

2σ2ε

.

Null distribution of test statistic ∼ F2,n−2 - Cutoff is obtained usingF2,n−2-distribution table.

p-value - P(F > Fobserved ) where F ∼ F2,n−2.

10 / 29

Page 42: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Another inferential goal - testing for β.

Individual test of significance H0 : β0 = 0 vs H1 : β0 6= 0 (test of intercept).

Test statistic T =β0

se(β0).

Null distribution of test statistic ∼ tn−2 - Cutoff is obtained usingtn−2-distribution table.

Practitioners prefer p-value - P(T > |Tobserved |) where T ∼ tn−2.

Individual test of significance H0 : β1 = 0 vs H1 : β1 6= 0 (test of slope).

Test statistic T = β1

se(β1).

Null distribution of test statistic ∼ tn−2 - Cutoff is obtained usingtn−2-distribution table.

Joint test of significance H0 : β = 0 vs H1 : β 6= 0

Test statistic F = β′(X ′X)β

2σ2ε

.

Null distribution of test statistic ∼ F2,n−2 - Cutoff is obtained usingF2,n−2-distribution table.

p-value - P(F > Fobserved ) where F ∼ F2,n−2.

10 / 29

Page 43: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Another inferential goal - testing for β.

Individual test of significance H0 : β0 = 0 vs H1 : β0 6= 0 (test of intercept).

Test statistic T =β0

se(β0).

Null distribution of test statistic ∼ tn−2 - Cutoff is obtained usingtn−2-distribution table.

Practitioners prefer p-value - P(T > |Tobserved |) where T ∼ tn−2.

Individual test of significance H0 : β1 = 0 vs H1 : β1 6= 0 (test of slope).

Test statistic T = β1

se(β1).

Null distribution of test statistic ∼ tn−2 - Cutoff is obtained usingtn−2-distribution table.

Joint test of significance H0 : β = 0 vs H1 : β 6= 0

Test statistic F = β′(X ′X)β

2σ2ε

.

Null distribution of test statistic ∼ F2,n−2 - Cutoff is obtained usingF2,n−2-distribution table.

p-value - P(F > Fobserved ) where F ∼ F2,n−2.

10 / 29

Page 44: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Another inferential goal - testing for β.

Individual test of significance H0 : β0 = 0 vs H1 : β0 6= 0 (test of intercept).

Test statistic T =β0

se(β0).

Null distribution of test statistic ∼ tn−2 - Cutoff is obtained usingtn−2-distribution table.

Practitioners prefer p-value - P(T > |Tobserved |) where T ∼ tn−2.

Individual test of significance H0 : β1 = 0 vs H1 : β1 6= 0 (test of slope).

Test statistic T = β1

se(β1).

Null distribution of test statistic ∼ tn−2 - Cutoff is obtained usingtn−2-distribution table.

Joint test of significance H0 : β = 0 vs H1 : β 6= 0

Test statistic F = β′(X ′X)β

2σ2ε

.

Null distribution of test statistic ∼ F2,n−2 - Cutoff is obtained usingF2,n−2-distribution table.

p-value - P(F > Fobserved ) where F ∼ F2,n−2.

10 / 29

Page 45: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Another inferential goal - testing for β.

Individual test of significance H0 : β0 = 0 vs H1 : β0 6= 0 (test of intercept).

Test statistic T =β0

se(β0).

Null distribution of test statistic ∼ tn−2 - Cutoff is obtained usingtn−2-distribution table.

Practitioners prefer p-value - P(T > |Tobserved |) where T ∼ tn−2.

Individual test of significance H0 : β1 = 0 vs H1 : β1 6= 0 (test of slope).

Test statistic T = β1

se(β1).

Null distribution of test statistic ∼ tn−2 - Cutoff is obtained usingtn−2-distribution table.

Joint test of significance H0 : β = 0 vs H1 : β 6= 0

Test statistic F = β′(X ′X)β

2σ2ε

.

Null distribution of test statistic ∼ F2,n−2 - Cutoff is obtained usingF2,n−2-distribution table.

p-value - P(F > Fobserved ) where F ∼ F2,n−2.

10 / 29

Page 46: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Another inferential goal - testing for β.

Individual test of significance H0 : β0 = 0 vs H1 : β0 6= 0 (test of intercept).

Test statistic T =β0

se(β0).

Null distribution of test statistic ∼ tn−2 - Cutoff is obtained usingtn−2-distribution table.

Practitioners prefer p-value - P(T > |Tobserved |) where T ∼ tn−2.

Individual test of significance H0 : β1 = 0 vs H1 : β1 6= 0 (test of slope).

Test statistic T = β1

se(β1).

Null distribution of test statistic ∼ tn−2 - Cutoff is obtained usingtn−2-distribution table.

Joint test of significance H0 : β = 0 vs H1 : β 6= 0

Test statistic F = β′(X ′X)β

2σ2ε

.

Null distribution of test statistic ∼ F2,n−2 - Cutoff is obtained usingF2,n−2-distribution table.

p-value - P(F > Fobserved ) where F ∼ F2,n−2.

10 / 29

Page 47: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Another inferential goal - testing for β.

Individual test of significance H0 : β0 = 0 vs H1 : β0 6= 0 (test of intercept).

Test statistic T =β0

se(β0).

Null distribution of test statistic ∼ tn−2 - Cutoff is obtained usingtn−2-distribution table.

Practitioners prefer p-value - P(T > |Tobserved |) where T ∼ tn−2.

Individual test of significance H0 : β1 = 0 vs H1 : β1 6= 0 (test of slope).

Test statistic T = β1

se(β1).

Null distribution of test statistic ∼ tn−2 - Cutoff is obtained usingtn−2-distribution table.

Joint test of significance H0 : β = 0 vs H1 : β 6= 0

Test statistic F = β′(X ′X)β

2σ2ε

.

Null distribution of test statistic ∼ F2,n−2 - Cutoff is obtained usingF2,n−2-distribution table.

p-value - P(F > Fobserved ) where F ∼ F2,n−2.

10 / 29

Page 48: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Another inferential goal - testing for β.

Individual test of significance H0 : β0 = 0 vs H1 : β0 6= 0 (test of intercept).

Test statistic T =β0

se(β0).

Null distribution of test statistic ∼ tn−2 - Cutoff is obtained usingtn−2-distribution table.

Practitioners prefer p-value - P(T > |Tobserved |) where T ∼ tn−2.

Individual test of significance H0 : β1 = 0 vs H1 : β1 6= 0 (test of slope).

Test statistic T = β1

se(β1).

Null distribution of test statistic ∼ tn−2 - Cutoff is obtained usingtn−2-distribution table.

Joint test of significance H0 : β = 0 vs H1 : β 6= 0

Test statistic F = β′(X ′X)β

2σ2ε

.

Null distribution of test statistic ∼ F2,n−2 - Cutoff is obtained usingF2,n−2-distribution table.

p-value - P(F > Fobserved ) where F ∼ F2,n−2.

10 / 29

Page 49: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Another inferential goal - testing for β.

Individual test of significance H0 : β0 = 0 vs H1 : β0 6= 0 (test of intercept).

Test statistic T =β0

se(β0).

Null distribution of test statistic ∼ tn−2 - Cutoff is obtained usingtn−2-distribution table.

Practitioners prefer p-value - P(T > |Tobserved |) where T ∼ tn−2.

Individual test of significance H0 : β1 = 0 vs H1 : β1 6= 0 (test of slope).

Test statistic T = β1

se(β1).

Null distribution of test statistic ∼ tn−2 - Cutoff is obtained usingtn−2-distribution table.

Joint test of significance H0 : β = 0 vs H1 : β 6= 0

Test statistic F = β′(X ′X)β

2σ2ε

.

Null distribution of test statistic ∼ F2,n−2 - Cutoff is obtained usingF2,n−2-distribution table.

p-value - P(F > Fobserved ) where F ∼ F2,n−2.

10 / 29

Page 50: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Another inferential goal - testing for β.

Individual test of significance H0 : β0 = 0 vs H1 : β0 6= 0 (test of intercept).

Test statistic T =β0

se(β0).

Null distribution of test statistic ∼ tn−2 - Cutoff is obtained usingtn−2-distribution table.

Practitioners prefer p-value - P(T > |Tobserved |) where T ∼ tn−2.

Individual test of significance H0 : β1 = 0 vs H1 : β1 6= 0 (test of slope).

Test statistic T = β1

se(β1).

Null distribution of test statistic ∼ tn−2 - Cutoff is obtained usingtn−2-distribution table.

Joint test of significance H0 : β = 0 vs H1 : β 6= 0

Test statistic F = β′(X ′X)β

2σ2ε

.

Null distribution of test statistic ∼ F2,n−2 - Cutoff is obtained usingF2,n−2-distribution table.

p-value - P(F > Fobserved ) where F ∼ F2,n−2.

10 / 29

Page 51: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Another inferential goal - testing for β.

Individual test of significance H0 : β0 = 0 vs H1 : β0 6= 0 (test of intercept).

Test statistic T =β0

se(β0).

Null distribution of test statistic ∼ tn−2 - Cutoff is obtained usingtn−2-distribution table.

Practitioners prefer p-value - P(T > |Tobserved |) where T ∼ tn−2.

Individual test of significance H0 : β1 = 0 vs H1 : β1 6= 0 (test of slope).

Test statistic T = β1

se(β1).

Null distribution of test statistic ∼ tn−2 - Cutoff is obtained usingtn−2-distribution table.

Joint test of significance H0 : β = 0 vs H1 : β 6= 0

Test statistic F = β′(X ′X)β

2σ2ε

.

Null distribution of test statistic ∼ F2,n−2 - Cutoff is obtained usingF2,n−2-distribution table.

p-value - P(F > Fobserved ) where F ∼ F2,n−2.

10 / 29

Page 52: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Confidence interval for β0 can be obtained by inverting the test statistic β0

se(β0).

Confidence interval for β1 can be obtained by inverting the test statistic β1

se(β1).

Confidence interval[β1 − tn−2,α2

se(β0), β1 + tn−2,α2se(β0)

].

tn−2,α2upper α2 -cutoff point.

Ellipsoidal joint confidence set for β is obtained by inverting the test statisticβ′(X′X)β

2σ2ε

.

Confidence ellipsoid - P(β : (β − β)′(X′X)(β − β) ≤ 2σ2εF2,n−2,α) = 1− α.

F2,n−2,α upper α-cutoff point.

All of the above findings are useless if model fit is poor - need to check if themodel is appropriate for the data.

Model diagnostic checking.

11 / 29

Page 53: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Confidence interval for β0 can be obtained by inverting the test statistic β0

se(β0).

Confidence interval for β1 can be obtained by inverting the test statistic β1

se(β1).

Confidence interval[β1 − tn−2,α2

se(β0), β1 + tn−2,α2se(β0)

].

tn−2,α2upper α2 -cutoff point.

Ellipsoidal joint confidence set for β is obtained by inverting the test statisticβ′(X′X)β

2σ2ε

.

Confidence ellipsoid - P(β : (β − β)′(X′X)(β − β) ≤ 2σ2εF2,n−2,α) = 1− α.

F2,n−2,α upper α-cutoff point.

All of the above findings are useless if model fit is poor - need to check if themodel is appropriate for the data.

Model diagnostic checking.

11 / 29

Page 54: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Confidence interval for β0 can be obtained by inverting the test statistic β0

se(β0).

Confidence interval for β1 can be obtained by inverting the test statistic β1

se(β1).

Confidence interval[β1 − tn−2,α2

se(β0), β1 + tn−2,α2se(β0)

].

tn−2,α2upper α2 -cutoff point.

Ellipsoidal joint confidence set for β is obtained by inverting the test statisticβ′(X′X)β

2σ2ε

.

Confidence ellipsoid - P(β : (β − β)′(X′X)(β − β) ≤ 2σ2εF2,n−2,α) = 1− α.

F2,n−2,α upper α-cutoff point.

All of the above findings are useless if model fit is poor - need to check if themodel is appropriate for the data.

Model diagnostic checking.

11 / 29

Page 55: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Confidence interval for β0 can be obtained by inverting the test statistic β0

se(β0).

Confidence interval for β1 can be obtained by inverting the test statistic β1

se(β1).

Confidence interval[β1 − tn−2,α2

se(β0), β1 + tn−2,α2se(β0)

].

tn−2,α2upper α2 -cutoff point.

Ellipsoidal joint confidence set for β is obtained by inverting the test statisticβ′(X′X)β

2σ2ε

.

Confidence ellipsoid - P(β : (β − β)′(X′X)(β − β) ≤ 2σ2εF2,n−2,α) = 1− α.

F2,n−2,α upper α-cutoff point.

All of the above findings are useless if model fit is poor - need to check if themodel is appropriate for the data.

Model diagnostic checking.

11 / 29

Page 56: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Confidence interval for β0 can be obtained by inverting the test statistic β0

se(β0).

Confidence interval for β1 can be obtained by inverting the test statistic β1

se(β1).

Confidence interval[β1 − tn−2,α2

se(β0), β1 + tn−2,α2se(β0)

].

tn−2,α2upper α2 -cutoff point.

Ellipsoidal joint confidence set for β is obtained by inverting the test statisticβ′(X′X)β

2σ2ε

.

Confidence ellipsoid - P(β : (β − β)′(X′X)(β − β) ≤ 2σ2εF2,n−2,α) = 1− α.

F2,n−2,α upper α-cutoff point.

All of the above findings are useless if model fit is poor - need to check if themodel is appropriate for the data.

Model diagnostic checking.

11 / 29

Page 57: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Confidence interval for β0 can be obtained by inverting the test statistic β0

se(β0).

Confidence interval for β1 can be obtained by inverting the test statistic β1

se(β1).

Confidence interval[β1 − tn−2,α2

se(β0), β1 + tn−2,α2se(β0)

].

tn−2,α2upper α2 -cutoff point.

Ellipsoidal joint confidence set for β is obtained by inverting the test statisticβ′(X′X)β

2σ2ε

.

Confidence ellipsoid - P(β : (β − β)′(X′X)(β − β) ≤ 2σ2εF2,n−2,α) = 1− α.

F2,n−2,α upper α-cutoff point.

All of the above findings are useless if model fit is poor - need to check if themodel is appropriate for the data.

Model diagnostic checking.

11 / 29

Page 58: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Confidence interval for β0 can be obtained by inverting the test statistic β0

se(β0).

Confidence interval for β1 can be obtained by inverting the test statistic β1

se(β1).

Confidence interval[β1 − tn−2,α2

se(β0), β1 + tn−2,α2se(β0)

].

tn−2,α2upper α2 -cutoff point.

Ellipsoidal joint confidence set for β is obtained by inverting the test statisticβ′(X′X)β

2σ2ε

.

Confidence ellipsoid - P(β : (β − β)′(X′X)(β − β) ≤ 2σ2εF2,n−2,α) = 1− α.

F2,n−2,α upper α-cutoff point.

All of the above findings are useless if model fit is poor - need to check if themodel is appropriate for the data.

Model diagnostic checking.

11 / 29

Page 59: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Confidence interval for β0 can be obtained by inverting the test statistic β0

se(β0).

Confidence interval for β1 can be obtained by inverting the test statistic β1

se(β1).

Confidence interval[β1 − tn−2,α2

se(β0), β1 + tn−2,α2se(β0)

].

tn−2,α2upper α2 -cutoff point.

Ellipsoidal joint confidence set for β is obtained by inverting the test statisticβ′(X′X)β

2σ2ε

.

Confidence ellipsoid - P(β : (β − β)′(X′X)(β − β) ≤ 2σ2εF2,n−2,α) = 1− α.

F2,n−2,α upper α-cutoff point.

All of the above findings are useless if model fit is poor - need to check if themodel is appropriate for the data.

Model diagnostic checking.

11 / 29

Page 60: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Confidence interval for β0 can be obtained by inverting the test statistic β0

se(β0).

Confidence interval for β1 can be obtained by inverting the test statistic β1

se(β1).

Confidence interval[β1 − tn−2,α2

se(β0), β1 + tn−2,α2se(β0)

].

tn−2,α2upper α2 -cutoff point.

Ellipsoidal joint confidence set for β is obtained by inverting the test statisticβ′(X′X)β

2σ2ε

.

Confidence ellipsoid - P(β : (β − β)′(X′X)(β − β) ≤ 2σ2εF2,n−2,α) = 1− α.

F2,n−2,α upper α-cutoff point.

All of the above findings are useless if model fit is poor - need to check if themodel is appropriate for the data.

Model diagnostic checking.

11 / 29

Page 61: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Appropriateness of Gauss-Markov assumption :(i) Linearity: The relationship between X and the mean of Y is linear

(E(Y|X) = Xβ).(ii) Homoscedasticity: The variance of residual is the same for x1, x2, · · · , xn.

(iii) Uncorrelatedness: Observations are uncorrelated of each other.

Normality: For any fixed value xi , [Yi |Xi = xi ] is normally distributed.

Normality + (iii) Uncorrelatedness: Observations are uncorrelated of each other⇒ Observations are independent of each other.

Check for potentially bad points which may lead to poor model fit :(i) Outliers: An outlier is defined as an observation that has a large residual. In

other words, the observed value for the point is very different from thatpredicted by the regression model.

(ii) Leverage points: A leverage point is defined as an observation that has avalue of xi that is far away from the mean of x1, x2, · · · , xn.

(iii) Influential observations: An influential observation is defined as anobservation that changes the slope of the line. Thus, influential points have a

large influence on the fit of the model.

12 / 29

Page 62: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Appropriateness of Gauss-Markov assumption :(i) Linearity: The relationship between X and the mean of Y is linear

(E(Y|X) = Xβ).(ii) Homoscedasticity: The variance of residual is the same for x1, x2, · · · , xn.

(iii) Uncorrelatedness: Observations are uncorrelated of each other.

Normality: For any fixed value xi , [Yi |Xi = xi ] is normally distributed.

Normality + (iii) Uncorrelatedness: Observations are uncorrelated of each other⇒ Observations are independent of each other.

Check for potentially bad points which may lead to poor model fit :(i) Outliers: An outlier is defined as an observation that has a large residual. In

other words, the observed value for the point is very different from thatpredicted by the regression model.

(ii) Leverage points: A leverage point is defined as an observation that has avalue of xi that is far away from the mean of x1, x2, · · · , xn.

(iii) Influential observations: An influential observation is defined as anobservation that changes the slope of the line. Thus, influential points have a

large influence on the fit of the model.

12 / 29

Page 63: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Appropriateness of Gauss-Markov assumption :(i) Linearity: The relationship between X and the mean of Y is linear

(E(Y|X) = Xβ).(ii) Homoscedasticity: The variance of residual is the same for x1, x2, · · · , xn.

(iii) Uncorrelatedness: Observations are uncorrelated of each other.

Normality: For any fixed value xi , [Yi |Xi = xi ] is normally distributed.

Normality + (iii) Uncorrelatedness: Observations are uncorrelated of each other⇒ Observations are independent of each other.

Check for potentially bad points which may lead to poor model fit :(i) Outliers: An outlier is defined as an observation that has a large residual. In

other words, the observed value for the point is very different from thatpredicted by the regression model.

(ii) Leverage points: A leverage point is defined as an observation that has avalue of xi that is far away from the mean of x1, x2, · · · , xn.

(iii) Influential observations: An influential observation is defined as anobservation that changes the slope of the line. Thus, influential points have a

large influence on the fit of the model.

12 / 29

Page 64: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Appropriateness of Gauss-Markov assumption :(i) Linearity: The relationship between X and the mean of Y is linear

(E(Y|X) = Xβ).(ii) Homoscedasticity: The variance of residual is the same for x1, x2, · · · , xn.

(iii) Uncorrelatedness: Observations are uncorrelated of each other.

Normality: For any fixed value xi , [Yi |Xi = xi ] is normally distributed.

Normality + (iii) Uncorrelatedness: Observations are uncorrelated of each other⇒ Observations are independent of each other.

Check for potentially bad points which may lead to poor model fit :(i) Outliers: An outlier is defined as an observation that has a large residual. In

other words, the observed value for the point is very different from thatpredicted by the regression model.

(ii) Leverage points: A leverage point is defined as an observation that has avalue of xi that is far away from the mean of x1, x2, · · · , xn.

(iii) Influential observations: An influential observation is defined as anobservation that changes the slope of the line. Thus, influential points have a

large influence on the fit of the model.

12 / 29

Page 65: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Linearity - Check the fitted value Yi vs residual ei plot for any pattern -randomly and closely distributed around x − axis indicates linearity.

Homoscedasticity - Check the fitted value Yi vs residual ei plot to see if thespread is changing as we move along x − axis - changing meansheteroscedastic.

0 20 40 60 80

-20

020

40

Fitted values

Res

idua

ls

lm(dist ~ speed)

Residuals vs Fitted

4923

35

(a)

Figure: Clear indication of nonlinearity and heteroscedasticity.

13 / 29

Page 66: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Linearity - Check the fitted value Yi vs residual ei plot for any pattern -randomly and closely distributed around x − axis indicates linearity.

Homoscedasticity - Check the fitted value Yi vs residual ei plot to see if thespread is changing as we move along x − axis - changing meansheteroscedastic.

0 20 40 60 80

-20

020

40

Fitted values

Res

idua

ls

lm(dist ~ speed)

Residuals vs Fitted

4923

35

(a)

Figure: Clear indication of nonlinearity and heteroscedasticity.

13 / 29

Page 67: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Homoscedasticity - Check the fitted value Yi vs square root of absolute

standardised residual√| eiσε√

1−hii| plot to see if the spread is changing as we

move along x − axis - changing means heteroscedastic.

0 20 40 60 80

0.0

0.5

1.0

1.5

Fitted values

Sta

ndar

dize

dre

sidu

als

lm(dist ~ speed)

Scale-Location

4923

35

(a)

Figure: Clear indication of heteroscedasticity.

This plot is more appropriate for homoscedasticity checking as Var(ei ) aredifferent not same as Var(εi ).

14 / 29

Page 68: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Homoscedasticity - Check the fitted value Yi vs square root of absolute

standardised residual√| eiσε√

1−hii| plot to see if the spread is changing as we

move along x − axis - changing means heteroscedastic.

0 20 40 60 80

0.0

0.5

1.0

1.5

Fitted values

Sta

ndar

dize

dre

sidu

als

lm(dist ~ speed)

Scale-Location

4923

35

(a)

Figure: Clear indication of heteroscedasticity.

This plot is more appropriate for homoscedasticity checking as Var(ei ) aredifferent not same as Var(εi ).

14 / 29

Page 69: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Var(ei ) = σ2ε(1− hii ) - so, Var(ei ) = σ2

ε(1− hii ).

So, standardised residual ei√Var(ei )

= eiσε√

1−hii.

hii is the i th leverage value - the i th diagonal entry of the matrixXQX = X(X′X)−1X′ = PX.

PX (some refers it as hat-matrix H) is an orthogonal projection matrix -idempotent and symmetric - also, y = PXy.

One can use Breusch-Pagan Test for checking homoscedasticity -asymptotically χ2 distributed.

Uncorrelatedness: Plot the sample autocorrelation function of the residuals.

15 / 29

Page 70: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Var(ei ) = σ2ε(1− hii ) - so, Var(ei ) = σ2

ε(1− hii ).

So, standardised residual ei√Var(ei )

= eiσε√

1−hii.

hii is the i th leverage value - the i th diagonal entry of the matrixXQX = X(X′X)−1X′ = PX.

PX (some refers it as hat-matrix H) is an orthogonal projection matrix -idempotent and symmetric - also, y = PXy.

One can use Breusch-Pagan Test for checking homoscedasticity -asymptotically χ2 distributed.

Uncorrelatedness: Plot the sample autocorrelation function of the residuals.

15 / 29

Page 71: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Var(ei ) = σ2ε(1− hii ) - so, Var(ei ) = σ2

ε(1− hii ).

So, standardised residual ei√Var(ei )

= eiσε√

1−hii.

hii is the i th leverage value - the i th diagonal entry of the matrixXQX = X(X′X)−1X′ = PX.

PX (some refers it as hat-matrix H) is an orthogonal projection matrix -idempotent and symmetric - also, y = PXy.

One can use Breusch-Pagan Test for checking homoscedasticity -asymptotically χ2 distributed.

Uncorrelatedness: Plot the sample autocorrelation function of the residuals.

15 / 29

Page 72: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Var(ei ) = σ2ε(1− hii ) - so, Var(ei ) = σ2

ε(1− hii ).

So, standardised residual ei√Var(ei )

= eiσε√

1−hii.

hii is the i th leverage value - the i th diagonal entry of the matrixXQX = X(X′X)−1X′ = PX.

PX (some refers it as hat-matrix H) is an orthogonal projection matrix -idempotent and symmetric - also, y = PXy.

One can use Breusch-Pagan Test for checking homoscedasticity -asymptotically χ2 distributed.

Uncorrelatedness: Plot the sample autocorrelation function of the residuals.

15 / 29

Page 73: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Var(ei ) = σ2ε(1− hii ) - so, Var(ei ) = σ2

ε(1− hii ).

So, standardised residual ei√Var(ei )

= eiσε√

1−hii.

hii is the i th leverage value - the i th diagonal entry of the matrixXQX = X(X′X)−1X′ = PX.

PX (some refers it as hat-matrix H) is an orthogonal projection matrix -idempotent and symmetric - also, y = PXy.

One can use Breusch-Pagan Test for checking homoscedasticity -asymptotically χ2 distributed.

Uncorrelatedness: Plot the sample autocorrelation function of the residuals.

15 / 29

Page 74: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Var(ei ) = σ2ε(1− hii ) - so, Var(ei ) = σ2

ε(1− hii ).

So, standardised residual ei√Var(ei )

= eiσε√

1−hii.

hii is the i th leverage value - the i th diagonal entry of the matrixXQX = X(X′X)−1X′ = PX.

PX (some refers it as hat-matrix H) is an orthogonal projection matrix -idempotent and symmetric - also, y = PXy.

One can use Breusch-Pagan Test for checking homoscedasticity -asymptotically χ2 distributed.

Uncorrelatedness: Plot the sample autocorrelation function of the residuals.

15 / 29

Page 75: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

0 5 10 15

-0.2

0.0

0.2

0.4

0.6

0.8

1.0

Lag

AC

F

Series fm1$residuals

(a)

Figure: Indication of uncorrelatedness.

Also can perform Durbin-Watson test and Box-Pierce test for checking whetherthere is any autocorrelation.

16 / 29

Page 76: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

0 5 10 15

-0.2

0.0

0.2

0.4

0.6

0.8

1.0

Lag

AC

F

Series fm1$residuals

(a)

Figure: Indication of uncorrelatedness.

Also can perform Durbin-Watson test and Box-Pierce test for checking whetherthere is any autocorrelation.

16 / 29

Page 77: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Normality: Q-Q plot of standardised/studentized residuals.

-2 -1 0 1 2

-2-1

01

23

Theoretical Quantiles

Sta

ndar

dize

d re

sidu

als

lm(dist ~ speed)

Normal Q-Q

4923

35

(a)

Figure: Indication of non-normality.

Also can perform Shapiro-Wilks test and Kolmogorov-Smirnov test for checkingdeparture from normality.

17 / 29

Page 78: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Normality: Q-Q plot of standardised/studentized residuals.

-2 -1 0 1 2

-2-1

01

23

Theoretical Quantiles

Sta

ndar

dize

d re

sidu

als

lm(dist ~ speed)

Normal Q-Q

4923

35

(a)

Figure: Indication of non-normality.

Also can perform Shapiro-Wilks test and Kolmogorov-Smirnov test for checkingdeparture from normality.

17 / 29

Page 79: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Outliers : Check the fitted value Yi vs residual ei plot for large values - potentialoutliers.

Leverage points : Check for points with high leverage values hii .

Recall that 0 ≤ hii ≤ 1.

Influential observations: Can be detected by looking into standardised residualsvs leverage plot.

0.00 0.02 0.04 0.06 0.08 0.10

-2-1

01

23

Leverage

Sta

ndar

dize

d re

sidu

als

lm(dist ~ speed)

Cook's distance

0.5

Residuals vs Leverage

4923

39

(a)

Figure: A few influential observations.18 / 29

Page 80: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Outliers : Check the fitted value Yi vs residual ei plot for large values - potentialoutliers.

Leverage points : Check for points with high leverage values hii .

Recall that 0 ≤ hii ≤ 1.

Influential observations: Can be detected by looking into standardised residualsvs leverage plot.

0.00 0.02 0.04 0.06 0.08 0.10

-2-1

01

23

Leverage

Sta

ndar

dize

d re

sidu

als

lm(dist ~ speed)

Cook's distance

0.5

Residuals vs Leverage

4923

39

(a)

Figure: A few influential observations.18 / 29

Page 81: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Outliers : Check the fitted value Yi vs residual ei plot for large values - potentialoutliers.

Leverage points : Check for points with high leverage values hii .

Recall that 0 ≤ hii ≤ 1.

Influential observations: Can be detected by looking into standardised residualsvs leverage plot.

0.00 0.02 0.04 0.06 0.08 0.10

-2-1

01

23

Leverage

Sta

ndar

dize

d re

sidu

als

lm(dist ~ speed)

Cook's distance

0.5

Residuals vs Leverage

4923

39

(a)

Figure: A few influential observations.18 / 29

Page 82: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Outliers : Check the fitted value Yi vs residual ei plot for large values - potentialoutliers.

Leverage points : Check for points with high leverage values hii .

Recall that 0 ≤ hii ≤ 1.

Influential observations: Can be detected by looking into standardised residualsvs leverage plot.

0.00 0.02 0.04 0.06 0.08 0.10

-2-1

01

23

Leverage

Sta

ndar

dize

d re

sidu

als

lm(dist ~ speed)

Cook's distance

0.5

Residuals vs Leverage

4923

39

(a)

Figure: A few influential observations.18 / 29

Page 83: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Also, some more numerical diagnostic measures are there for detection ofpotentially influential observations.

Cook’s distance :

Di = 12

(ei

σε√

1−hii

)2hii

1−hii= 1

2 (standardized residual)2 hii1−hii

.

So, Cook’s D is a function of studentized residual and leverage value - can beplotted as a nonlinear contours in the residuals vs leverage plot.

High leverage values (close to 1) means Cook’s distance very large - highlyinfluential observation.

DFFIT : DFFITi = difference in fit as we drop the i th observation.

Relationship between Di and DFFITi : Di = 12

σ2ε(i)σ2ε

DFFIT 2i .

If the model diagnostic checking turns out satisfactory then we check for howgood the model fits the data.

19 / 29

Page 84: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Also, some more numerical diagnostic measures are there for detection ofpotentially influential observations.

Cook’s distance :

Di = 12

(ei

σε√

1−hii

)2hii

1−hii= 1

2 (standardized residual)2 hii1−hii

.

So, Cook’s D is a function of studentized residual and leverage value - can beplotted as a nonlinear contours in the residuals vs leverage plot.

High leverage values (close to 1) means Cook’s distance very large - highlyinfluential observation.

DFFIT : DFFITi = difference in fit as we drop the i th observation.

Relationship between Di and DFFITi : Di = 12

σ2ε(i)σ2ε

DFFIT 2i .

If the model diagnostic checking turns out satisfactory then we check for howgood the model fits the data.

19 / 29

Page 85: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Also, some more numerical diagnostic measures are there for detection ofpotentially influential observations.

Cook’s distance :

Di = 12

(ei

σε√

1−hii

)2hii

1−hii= 1

2 (standardized residual)2 hii1−hii

.

So, Cook’s D is a function of studentized residual and leverage value - can beplotted as a nonlinear contours in the residuals vs leverage plot.

High leverage values (close to 1) means Cook’s distance very large - highlyinfluential observation.

DFFIT : DFFITi = difference in fit as we drop the i th observation.

Relationship between Di and DFFITi : Di = 12

σ2ε(i)σ2ε

DFFIT 2i .

If the model diagnostic checking turns out satisfactory then we check for howgood the model fits the data.

19 / 29

Page 86: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Also, some more numerical diagnostic measures are there for detection ofpotentially influential observations.

Cook’s distance :

Di = 12

(ei

σε√

1−hii

)2hii

1−hii= 1

2 (standardized residual)2 hii1−hii

.

So, Cook’s D is a function of studentized residual and leverage value - can beplotted as a nonlinear contours in the residuals vs leverage plot.

High leverage values (close to 1) means Cook’s distance very large - highlyinfluential observation.

DFFIT : DFFITi = difference in fit as we drop the i th observation.

Relationship between Di and DFFITi : Di = 12

σ2ε(i)σ2ε

DFFIT 2i .

If the model diagnostic checking turns out satisfactory then we check for howgood the model fits the data.

19 / 29

Page 87: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Also, some more numerical diagnostic measures are there for detection ofpotentially influential observations.

Cook’s distance :

Di = 12

(ei

σε√

1−hii

)2hii

1−hii= 1

2 (standardized residual)2 hii1−hii

.

So, Cook’s D is a function of studentized residual and leverage value - can beplotted as a nonlinear contours in the residuals vs leverage plot.

High leverage values (close to 1) means Cook’s distance very large - highlyinfluential observation.

DFFIT : DFFITi = difference in fit as we drop the i th observation.

Relationship between Di and DFFITi : Di = 12

σ2ε(i)σ2ε

DFFIT 2i .

If the model diagnostic checking turns out satisfactory then we check for howgood the model fits the data.

19 / 29

Page 88: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Also, some more numerical diagnostic measures are there for detection ofpotentially influential observations.

Cook’s distance :

Di = 12

(ei

σε√

1−hii

)2hii

1−hii= 1

2 (standardized residual)2 hii1−hii

.

So, Cook’s D is a function of studentized residual and leverage value - can beplotted as a nonlinear contours in the residuals vs leverage plot.

High leverage values (close to 1) means Cook’s distance very large - highlyinfluential observation.

DFFIT : DFFITi = difference in fit as we drop the i th observation.

Relationship between Di and DFFITi : Di = 12

σ2ε(i)σ2ε

DFFIT 2i .

If the model diagnostic checking turns out satisfactory then we check for howgood the model fits the data.

19 / 29

Page 89: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Also, some more numerical diagnostic measures are there for detection ofpotentially influential observations.

Cook’s distance :

Di = 12

(ei

σε√

1−hii

)2hii

1−hii= 1

2 (standardized residual)2 hii1−hii

.

So, Cook’s D is a function of studentized residual and leverage value - can beplotted as a nonlinear contours in the residuals vs leverage plot.

High leverage values (close to 1) means Cook’s distance very large - highlyinfluential observation.

DFFIT : DFFITi = difference in fit as we drop the i th observation.

Relationship between Di and DFFITi : Di = 12

σ2ε(i)σ2ε

DFFIT 2i .

If the model diagnostic checking turns out satisfactory then we check for howgood the model fits the data.

19 / 29

Page 90: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

There are several such goodness of fit measure.

These measures are useful in selection of a single best model among severalcompeting models.

R-squared - R2 = Var(Y )Var(y)

=∑n

i=1(Yi−Y )2∑ni=1(yi−y)2 ; 0 ≤ R2 ≤ 1.

Problem of R2 - tend to select overfitting models.

Adjusted R-squared - R2adj = 1− (n−1)(1−R2)

(n−2)- higher the better - can be

negative!

AIC - −2 ln(L(βmle, σ2εmle|y,X)) + 2(2 + 1) - lower the better.

BIC - −2 ln(L(βmle, σ2εmle|y,X)) + ln(n)(2 + 1) - lower the better.

BIC penalizes complex models more severely - better to use BIC than AIC.

20 / 29

Page 91: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

There are several such goodness of fit measure.

These measures are useful in selection of a single best model among severalcompeting models.

R-squared - R2 = Var(Y )Var(y)

=∑n

i=1(Yi−Y )2∑ni=1(yi−y)2 ; 0 ≤ R2 ≤ 1.

Problem of R2 - tend to select overfitting models.

Adjusted R-squared - R2adj = 1− (n−1)(1−R2)

(n−2)- higher the better - can be

negative!

AIC - −2 ln(L(βmle, σ2εmle|y,X)) + 2(2 + 1) - lower the better.

BIC - −2 ln(L(βmle, σ2εmle|y,X)) + ln(n)(2 + 1) - lower the better.

BIC penalizes complex models more severely - better to use BIC than AIC.

20 / 29

Page 92: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

There are several such goodness of fit measure.

These measures are useful in selection of a single best model among severalcompeting models.

R-squared - R2 = Var(Y )Var(y)

=∑n

i=1(Yi−Y )2∑ni=1(yi−y)2 ; 0 ≤ R2 ≤ 1.

Problem of R2 - tend to select overfitting models.

Adjusted R-squared - R2adj = 1− (n−1)(1−R2)

(n−2)- higher the better - can be

negative!

AIC - −2 ln(L(βmle, σ2εmle|y,X)) + 2(2 + 1) - lower the better.

BIC - −2 ln(L(βmle, σ2εmle|y,X)) + ln(n)(2 + 1) - lower the better.

BIC penalizes complex models more severely - better to use BIC than AIC.

20 / 29

Page 93: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

There are several such goodness of fit measure.

These measures are useful in selection of a single best model among severalcompeting models.

R-squared - R2 = Var(Y )Var(y)

=∑n

i=1(Yi−Y )2∑ni=1(yi−y)2 ; 0 ≤ R2 ≤ 1.

Problem of R2 - tend to select overfitting models.

Adjusted R-squared - R2adj = 1− (n−1)(1−R2)

(n−2)- higher the better - can be

negative!

AIC - −2 ln(L(βmle, σ2εmle|y,X)) + 2(2 + 1) - lower the better.

BIC - −2 ln(L(βmle, σ2εmle|y,X)) + ln(n)(2 + 1) - lower the better.

BIC penalizes complex models more severely - better to use BIC than AIC.

20 / 29

Page 94: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

There are several such goodness of fit measure.

These measures are useful in selection of a single best model among severalcompeting models.

R-squared - R2 = Var(Y )Var(y)

=∑n

i=1(Yi−Y )2∑ni=1(yi−y)2 ; 0 ≤ R2 ≤ 1.

Problem of R2 - tend to select overfitting models.

Adjusted R-squared - R2adj = 1− (n−1)(1−R2)

(n−2)- higher the better - can be

negative!

AIC - −2 ln(L(βmle, σ2εmle|y,X)) + 2(2 + 1) - lower the better.

BIC - −2 ln(L(βmle, σ2εmle|y,X)) + ln(n)(2 + 1) - lower the better.

BIC penalizes complex models more severely - better to use BIC than AIC.

20 / 29

Page 95: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

There are several such goodness of fit measure.

These measures are useful in selection of a single best model among severalcompeting models.

R-squared - R2 = Var(Y )Var(y)

=∑n

i=1(Yi−Y )2∑ni=1(yi−y)2 ; 0 ≤ R2 ≤ 1.

Problem of R2 - tend to select overfitting models.

Adjusted R-squared - R2adj = 1− (n−1)(1−R2)

(n−2)- higher the better - can be

negative!

AIC - −2 ln(L(βmle, σ2εmle|y,X)) + 2(2 + 1) - lower the better.

BIC - −2 ln(L(βmle, σ2εmle|y,X)) + ln(n)(2 + 1) - lower the better.

BIC penalizes complex models more severely - better to use BIC than AIC.

20 / 29

Page 96: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

There are several such goodness of fit measure.

These measures are useful in selection of a single best model among severalcompeting models.

R-squared - R2 = Var(Y )Var(y)

=∑n

i=1(Yi−Y )2∑ni=1(yi−y)2 ; 0 ≤ R2 ≤ 1.

Problem of R2 - tend to select overfitting models.

Adjusted R-squared - R2adj = 1− (n−1)(1−R2)

(n−2)- higher the better - can be

negative!

AIC - −2 ln(L(βmle, σ2εmle|y,X)) + 2(2 + 1) - lower the better.

BIC - −2 ln(L(βmle, σ2εmle|y,X)) + ln(n)(2 + 1) - lower the better.

BIC penalizes complex models more severely - better to use BIC than AIC.

20 / 29

Page 97: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

There are several such goodness of fit measure.

These measures are useful in selection of a single best model among severalcompeting models.

R-squared - R2 = Var(Y )Var(y)

=∑n

i=1(Yi−Y )2∑ni=1(yi−y)2 ; 0 ≤ R2 ≤ 1.

Problem of R2 - tend to select overfitting models.

Adjusted R-squared - R2adj = 1− (n−1)(1−R2)

(n−2)- higher the better - can be

negative!

AIC - −2 ln(L(βmle, σ2εmle|y,X)) + 2(2 + 1) - lower the better.

BIC - −2 ln(L(βmle, σ2εmle|y,X)) + ln(n)(2 + 1) - lower the better.

BIC penalizes complex models more severely - better to use BIC than AIC.

20 / 29

Page 98: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Multiple linear regression model :More familiar specification - Y = Xβ + ε; ε ∼ MVN(0, σ2

ε In).

Y =

y1y2...

yn

, X =

1 x11 · · · xp11 x2 · · · xp2...1 xn · · · xpn

and ε =

ε1ε2...εn

are unobserved random

errors. β =

β0β1...βp

.

All the previous developments are applicable.

Polynomial regression model : [Yi |Xi = xi ]ind∼ N(a + bxi + cx2

i , σ2ε) is a

special case.

21 / 29

Page 99: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Multiple linear regression model :More familiar specification - Y = Xβ + ε; ε ∼ MVN(0, σ2

ε In).

Y =

y1y2...

yn

, X =

1 x11 · · · xp11 x2 · · · xp2...1 xn · · · xpn

and ε =

ε1ε2...εn

are unobserved random

errors. β =

β0β1...βp

.

All the previous developments are applicable.

Polynomial regression model : [Yi |Xi = xi ]ind∼ N(a + bxi + cx2

i , σ2ε) is a

special case.

21 / 29

Page 100: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Multiple linear regression model :More familiar specification - Y = Xβ + ε; ε ∼ MVN(0, σ2

ε In).

Y =

y1y2...

yn

, X =

1 x11 · · · xp11 x2 · · · xp2...1 xn · · · xpn

and ε =

ε1ε2...εn

are unobserved random

errors. β =

β0β1...βp

.

All the previous developments are applicable.

Polynomial regression model : [Yi |Xi = xi ]ind∼ N(a + bxi + cx2

i , σ2ε) is a

special case.

21 / 29

Page 101: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Linear Regression Models

Multiple linear regression model :More familiar specification - Y = Xβ + ε; ε ∼ MVN(0, σ2

ε In).

Y =

y1y2...

yn

, X =

1 x11 · · · xp11 x2 · · · xp2...1 xn · · · xpn

and ε =

ε1ε2...εn

are unobserved random

errors. β =

β0β1...βp

.

All the previous developments are applicable.

Polynomial regression model : [Yi |Xi = xi ]ind∼ N(a + bxi + cx2

i , σ2ε) is a

special case.

21 / 29

Page 102: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression

Observations taken on two features - covariate is continuous say dosage of adrug (x) and response (y) is binary subject is alive/dead (we code it as 0/1).

Scatter plot of (x) and (y) does not give much insight!

0 2 4 6 8 10 12

0.0

0.2

0.4

0.6

0.8

1.0

Classification1training$Percent_Word_Money_Freq

Classification1training$Spam

(a)

Figure: Scatter plot of x and y(0/1) - not useful.

Not much of descriptive statistics can be done.

Still - need some motivation!

22 / 29

Page 103: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression

Observations taken on two features - covariate is continuous say dosage of adrug (x) and response (y) is binary subject is alive/dead (we code it as 0/1).

Scatter plot of (x) and (y) does not give much insight!

0 2 4 6 8 10 12

0.0

0.2

0.4

0.6

0.8

1.0

Classification1training$Percent_Word_Money_Freq

Classification1training$Spam

(a)

Figure: Scatter plot of x and y(0/1) - not useful.

Not much of descriptive statistics can be done.

Still - need some motivation!

22 / 29

Page 104: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression

Observations taken on two features - covariate is continuous say dosage of adrug (x) and response (y) is binary subject is alive/dead (we code it as 0/1).

Scatter plot of (x) and (y) does not give much insight!

0 2 4 6 8 10 12

0.0

0.2

0.4

0.6

0.8

1.0

Classification1training$Percent_Word_Money_Freq

Classification1training$Spam

(a)

Figure: Scatter plot of x and y(0/1) - not useful.

Not much of descriptive statistics can be done.

Still - need some motivation!

22 / 29

Page 105: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression

Observations taken on two features - covariate is continuous say dosage of adrug (x) and response (y) is binary subject is alive/dead (we code it as 0/1).

Scatter plot of (x) and (y) does not give much insight!

0 2 4 6 8 10 12

0.0

0.2

0.4

0.6

0.8

1.0

Classification1training$Percent_Word_Money_Freq

Classification1training$Spam

(a)

Figure: Scatter plot of x and y(0/1) - not useful.

Not much of descriptive statistics can be done.

Still - need some motivation!

22 / 29

Page 106: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

In simple linear regression model we have assumption E(Y |X = x) = a + bx .

Now for the logistic regression model we have assumption E(Y |X = x) =1× P(Y = 1|X = x) + 0× P(Y = 0|X = x) = P(Y = 1|X = x) = a + bx?? -meaningless

0 ≤ P(Y = 1|X = x) ≤ 1 but −∞ < a + bx < +∞ for b 6= 0.

However, P(Y = 1|X = x) = ea+bx

1+ea+bx - absolutely meaningful.

ea+bx

1+ea+bx - logistic distribution - so the name logistic regression.

logit(P(Y = 1|X = x)) = log(ODDS for Y=1) = log(

P(Y =1|X=x)P(Y =0|X=x)

)=

log(

P(Y =1|X=x)1−P(Y =1|X=x)

)= a + bx - so the name logit regression.

If not coded using dummy variables - P(Y = “dead ′′|X = x) = ea+bx

1+ea+bx .

Reason?(i) Very Simple Form.

(ii) Lots of Similarity with Linear Regression Model.

(iii) Logistic Regression Model/Logit Regression

Model is Highly Successful!

23 / 29

Page 107: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

In simple linear regression model we have assumption E(Y |X = x) = a + bx .

Now for the logistic regression model we have assumption E(Y |X = x) =1× P(Y = 1|X = x) + 0× P(Y = 0|X = x) = P(Y = 1|X = x) = a + bx?? -meaningless

0 ≤ P(Y = 1|X = x) ≤ 1 but −∞ < a + bx < +∞ for b 6= 0.

However, P(Y = 1|X = x) = ea+bx

1+ea+bx - absolutely meaningful.

ea+bx

1+ea+bx - logistic distribution - so the name logistic regression.

logit(P(Y = 1|X = x)) = log(ODDS for Y=1) = log(

P(Y =1|X=x)P(Y =0|X=x)

)=

log(

P(Y =1|X=x)1−P(Y =1|X=x)

)= a + bx - so the name logit regression.

If not coded using dummy variables - P(Y = “dead ′′|X = x) = ea+bx

1+ea+bx .

Reason?(i) Very Simple Form.

(ii) Lots of Similarity with Linear Regression Model.

(iii) Logistic Regression Model/Logit Regression

Model is Highly Successful!

23 / 29

Page 108: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

In simple linear regression model we have assumption E(Y |X = x) = a + bx .

Now for the logistic regression model we have assumption E(Y |X = x) =1× P(Y = 1|X = x) + 0× P(Y = 0|X = x) = P(Y = 1|X = x) = a + bx?? -meaningless

0 ≤ P(Y = 1|X = x) ≤ 1 but −∞ < a + bx < +∞ for b 6= 0.

However, P(Y = 1|X = x) = ea+bx

1+ea+bx - absolutely meaningful.

ea+bx

1+ea+bx - logistic distribution - so the name logistic regression.

logit(P(Y = 1|X = x)) = log(ODDS for Y=1) = log(

P(Y =1|X=x)P(Y =0|X=x)

)=

log(

P(Y =1|X=x)1−P(Y =1|X=x)

)= a + bx - so the name logit regression.

If not coded using dummy variables - P(Y = “dead ′′|X = x) = ea+bx

1+ea+bx .

Reason?(i) Very Simple Form.

(ii) Lots of Similarity with Linear Regression Model.

(iii) Logistic Regression Model/Logit Regression

Model is Highly Successful!

23 / 29

Page 109: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

In simple linear regression model we have assumption E(Y |X = x) = a + bx .

Now for the logistic regression model we have assumption E(Y |X = x) =1× P(Y = 1|X = x) + 0× P(Y = 0|X = x) = P(Y = 1|X = x) = a + bx?? -meaningless

0 ≤ P(Y = 1|X = x) ≤ 1 but −∞ < a + bx < +∞ for b 6= 0.

However, P(Y = 1|X = x) = ea+bx

1+ea+bx - absolutely meaningful.

ea+bx

1+ea+bx - logistic distribution - so the name logistic regression.

logit(P(Y = 1|X = x)) = log(ODDS for Y=1) = log(

P(Y =1|X=x)P(Y =0|X=x)

)=

log(

P(Y =1|X=x)1−P(Y =1|X=x)

)= a + bx - so the name logit regression.

If not coded using dummy variables - P(Y = “dead ′′|X = x) = ea+bx

1+ea+bx .

Reason?(i) Very Simple Form.

(ii) Lots of Similarity with Linear Regression Model.

(iii) Logistic Regression Model/Logit Regression

Model is Highly Successful!

23 / 29

Page 110: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

In simple linear regression model we have assumption E(Y |X = x) = a + bx .

Now for the logistic regression model we have assumption E(Y |X = x) =1× P(Y = 1|X = x) + 0× P(Y = 0|X = x) = P(Y = 1|X = x) = a + bx?? -meaningless

0 ≤ P(Y = 1|X = x) ≤ 1 but −∞ < a + bx < +∞ for b 6= 0.

However, P(Y = 1|X = x) = ea+bx

1+ea+bx - absolutely meaningful.

ea+bx

1+ea+bx - logistic distribution - so the name logistic regression.

logit(P(Y = 1|X = x)) = log(ODDS for Y=1) = log(

P(Y =1|X=x)P(Y =0|X=x)

)=

log(

P(Y =1|X=x)1−P(Y =1|X=x)

)= a + bx - so the name logit regression.

If not coded using dummy variables - P(Y = “dead ′′|X = x) = ea+bx

1+ea+bx .

Reason?(i) Very Simple Form.

(ii) Lots of Similarity with Linear Regression Model.

(iii) Logistic Regression Model/Logit Regression

Model is Highly Successful!

23 / 29

Page 111: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

In simple linear regression model we have assumption E(Y |X = x) = a + bx .

Now for the logistic regression model we have assumption E(Y |X = x) =1× P(Y = 1|X = x) + 0× P(Y = 0|X = x) = P(Y = 1|X = x) = a + bx?? -meaningless

0 ≤ P(Y = 1|X = x) ≤ 1 but −∞ < a + bx < +∞ for b 6= 0.

However, P(Y = 1|X = x) = ea+bx

1+ea+bx - absolutely meaningful.

ea+bx

1+ea+bx - logistic distribution - so the name logistic regression.

logit(P(Y = 1|X = x)) = log(ODDS for Y=1) = log(

P(Y =1|X=x)P(Y =0|X=x)

)=

log(

P(Y =1|X=x)1−P(Y =1|X=x)

)= a + bx - so the name logit regression.

If not coded using dummy variables - P(Y = “dead ′′|X = x) = ea+bx

1+ea+bx .

Reason?(i) Very Simple Form.

(ii) Lots of Similarity with Linear Regression Model.

(iii) Logistic Regression Model/Logit Regression

Model is Highly Successful!

23 / 29

Page 112: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

In simple linear regression model we have assumption E(Y |X = x) = a + bx .

Now for the logistic regression model we have assumption E(Y |X = x) =1× P(Y = 1|X = x) + 0× P(Y = 0|X = x) = P(Y = 1|X = x) = a + bx?? -meaningless

0 ≤ P(Y = 1|X = x) ≤ 1 but −∞ < a + bx < +∞ for b 6= 0.

However, P(Y = 1|X = x) = ea+bx

1+ea+bx - absolutely meaningful.

ea+bx

1+ea+bx - logistic distribution - so the name logistic regression.

logit(P(Y = 1|X = x)) = log(ODDS for Y=1) = log(

P(Y =1|X=x)P(Y =0|X=x)

)=

log(

P(Y =1|X=x)1−P(Y =1|X=x)

)= a + bx - so the name logit regression.

If not coded using dummy variables - P(Y = “dead ′′|X = x) = ea+bx

1+ea+bx .

Reason?(i) Very Simple Form.

(ii) Lots of Similarity with Linear Regression Model.

(iii) Logistic Regression Model/Logit Regression

Model is Highly Successful!

23 / 29

Page 113: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

In simple linear regression model we have assumption E(Y |X = x) = a + bx .

Now for the logistic regression model we have assumption E(Y |X = x) =1× P(Y = 1|X = x) + 0× P(Y = 0|X = x) = P(Y = 1|X = x) = a + bx?? -meaningless

0 ≤ P(Y = 1|X = x) ≤ 1 but −∞ < a + bx < +∞ for b 6= 0.

However, P(Y = 1|X = x) = ea+bx

1+ea+bx - absolutely meaningful.

ea+bx

1+ea+bx - logistic distribution - so the name logistic regression.

logit(P(Y = 1|X = x)) = log(ODDS for Y=1) = log(

P(Y =1|X=x)P(Y =0|X=x)

)=

log(

P(Y =1|X=x)1−P(Y =1|X=x)

)= a + bx - so the name logit regression.

If not coded using dummy variables - P(Y = “dead ′′|X = x) = ea+bx

1+ea+bx .

Reason?(i) Very Simple Form.

(ii) Lots of Similarity with Linear Regression Model.

(iii) Logistic Regression Model/Logit Regression

Model is Highly Successful!

23 / 29

Page 114: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

Logistic regression model used in(a) spam detection based on certain words and characters.(b) malignant tumor detection based on certain cell profiles.(c) loan defaulters detection based on personal/socio-economic anddemographic profiles.

Difference with linear regression - no closed form solution available.

Simple logistic regression model :

[Y1 = y1, · · · ,Yn = yn|X1 = x1, · · · ,Xn = xn] ∼n∏

i=1

[P(Y = 1|X = xi )]yi [1− P(Y = 1|X = xi )]1−yi =

n∏i=1

[ea+bxi

1 + ea+bxi]yi [

11 + ea+bxi

]1−yi

Model parameters - a, b.

The model is nothing but a family of product of Bernoulli distributions indexedby unknown parameters a, b.

More familiar specification - [Yi |Xi = xi ]ind∼ Ber( ea+bx

1+ea+bx ).

24 / 29

Page 115: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

Logistic regression model used in(a) spam detection based on certain words and characters.(b) malignant tumor detection based on certain cell profiles.(c) loan defaulters detection based on personal/socio-economic anddemographic profiles.

Difference with linear regression - no closed form solution available.

Simple logistic regression model :

[Y1 = y1, · · · ,Yn = yn|X1 = x1, · · · ,Xn = xn] ∼n∏

i=1

[P(Y = 1|X = xi )]yi [1− P(Y = 1|X = xi )]1−yi =

n∏i=1

[ea+bxi

1 + ea+bxi]yi [

11 + ea+bxi

]1−yi

Model parameters - a, b.

The model is nothing but a family of product of Bernoulli distributions indexedby unknown parameters a, b.

More familiar specification - [Yi |Xi = xi ]ind∼ Ber( ea+bx

1+ea+bx ).

24 / 29

Page 116: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

Logistic regression model used in(a) spam detection based on certain words and characters.(b) malignant tumor detection based on certain cell profiles.(c) loan defaulters detection based on personal/socio-economic anddemographic profiles.

Difference with linear regression - no closed form solution available.

Simple logistic regression model :

[Y1 = y1, · · · ,Yn = yn|X1 = x1, · · · ,Xn = xn] ∼n∏

i=1

[P(Y = 1|X = xi )]yi [1− P(Y = 1|X = xi )]1−yi =

n∏i=1

[ea+bxi

1 + ea+bxi]yi [

11 + ea+bxi

]1−yi

Model parameters - a, b.

The model is nothing but a family of product of Bernoulli distributions indexedby unknown parameters a, b.

More familiar specification - [Yi |Xi = xi ]ind∼ Ber( ea+bx

1+ea+bx ).

24 / 29

Page 117: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

Logistic regression model used in(a) spam detection based on certain words and characters.(b) malignant tumor detection based on certain cell profiles.(c) loan defaulters detection based on personal/socio-economic anddemographic profiles.

Difference with linear regression - no closed form solution available.

Simple logistic regression model :

[Y1 = y1, · · · ,Yn = yn|X1 = x1, · · · ,Xn = xn] ∼n∏

i=1

[P(Y = 1|X = xi )]yi [1− P(Y = 1|X = xi )]1−yi =

n∏i=1

[ea+bxi

1 + ea+bxi]yi [

11 + ea+bxi

]1−yi

Model parameters - a, b.

The model is nothing but a family of product of Bernoulli distributions indexedby unknown parameters a, b.

More familiar specification - [Yi |Xi = xi ]ind∼ Ber( ea+bx

1+ea+bx ).

24 / 29

Page 118: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

Logistic regression model used in(a) spam detection based on certain words and characters.(b) malignant tumor detection based on certain cell profiles.(c) loan defaulters detection based on personal/socio-economic anddemographic profiles.

Difference with linear regression - no closed form solution available.

Simple logistic regression model :

[Y1 = y1, · · · ,Yn = yn|X1 = x1, · · · ,Xn = xn] ∼n∏

i=1

[P(Y = 1|X = xi )]yi [1− P(Y = 1|X = xi )]1−yi =

n∏i=1

[ea+bxi

1 + ea+bxi]yi [

11 + ea+bxi

]1−yi

Model parameters - a, b.

The model is nothing but a family of product of Bernoulli distributions indexedby unknown parameters a, b.

More familiar specification - [Yi |Xi = xi ]ind∼ Ber( ea+bx

1+ea+bx ).

24 / 29

Page 119: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

Logistic regression model used in(a) spam detection based on certain words and characters.(b) malignant tumor detection based on certain cell profiles.(c) loan defaulters detection based on personal/socio-economic anddemographic profiles.

Difference with linear regression - no closed form solution available.

Simple logistic regression model :

[Y1 = y1, · · · ,Yn = yn|X1 = x1, · · · ,Xn = xn] ∼n∏

i=1

[P(Y = 1|X = xi )]yi [1− P(Y = 1|X = xi )]1−yi =

n∏i=1

[ea+bxi

1 + ea+bxi]yi [

11 + ea+bxi

]1−yi

Model parameters - a, b.

The model is nothing but a family of product of Bernoulli distributions indexedby unknown parameters a, b.

More familiar specification - [Yi |Xi = xi ]ind∼ Ber( ea+bx

1+ea+bx ).

24 / 29

Page 120: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

The model is fitted using maximum likelihood method.

Inferential goal - estimating the parameter vector β = (a, b)′.

mle of β is denoted by β - unlike linear regression no closed form expression.

mle is calculated using numerical algorithm - Fisher’s scoring algorithm.

Often the algorithm may not converge - multicollinearity, sparseness andcomplete separation.

multicollinearity : when covariate/predictor variables are linearly highlycorrelated.

sparseness : for some combinations of covariate variables we do not get anydata.

complete separation : beyond some combination threshold value only Y = 1 oronly Y = 0 responses are obtained.

For the simple logistic regression model instead of Fisher’s scoring one oftenuse Newton-Raphson method.

For the simple logistic regression model Newton-Raphson method become aiteratively reweighted least squares (IRLS) algorithm.

IRLS form is highly useful since calculation of least squares is relatively easy.

25 / 29

Page 121: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

The model is fitted using maximum likelihood method.

Inferential goal - estimating the parameter vector β = (a, b)′.

mle of β is denoted by β - unlike linear regression no closed form expression.

mle is calculated using numerical algorithm - Fisher’s scoring algorithm.

Often the algorithm may not converge - multicollinearity, sparseness andcomplete separation.

multicollinearity : when covariate/predictor variables are linearly highlycorrelated.

sparseness : for some combinations of covariate variables we do not get anydata.

complete separation : beyond some combination threshold value only Y = 1 oronly Y = 0 responses are obtained.

For the simple logistic regression model instead of Fisher’s scoring one oftenuse Newton-Raphson method.

For the simple logistic regression model Newton-Raphson method become aiteratively reweighted least squares (IRLS) algorithm.

IRLS form is highly useful since calculation of least squares is relatively easy.

25 / 29

Page 122: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

The model is fitted using maximum likelihood method.

Inferential goal - estimating the parameter vector β = (a, b)′.

mle of β is denoted by β - unlike linear regression no closed form expression.

mle is calculated using numerical algorithm - Fisher’s scoring algorithm.

Often the algorithm may not converge - multicollinearity, sparseness andcomplete separation.

multicollinearity : when covariate/predictor variables are linearly highlycorrelated.

sparseness : for some combinations of covariate variables we do not get anydata.

complete separation : beyond some combination threshold value only Y = 1 oronly Y = 0 responses are obtained.

For the simple logistic regression model instead of Fisher’s scoring one oftenuse Newton-Raphson method.

For the simple logistic regression model Newton-Raphson method become aiteratively reweighted least squares (IRLS) algorithm.

IRLS form is highly useful since calculation of least squares is relatively easy.

25 / 29

Page 123: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

The model is fitted using maximum likelihood method.

Inferential goal - estimating the parameter vector β = (a, b)′.

mle of β is denoted by β - unlike linear regression no closed form expression.

mle is calculated using numerical algorithm - Fisher’s scoring algorithm.

Often the algorithm may not converge - multicollinearity, sparseness andcomplete separation.

multicollinearity : when covariate/predictor variables are linearly highlycorrelated.

sparseness : for some combinations of covariate variables we do not get anydata.

complete separation : beyond some combination threshold value only Y = 1 oronly Y = 0 responses are obtained.

For the simple logistic regression model instead of Fisher’s scoring one oftenuse Newton-Raphson method.

For the simple logistic regression model Newton-Raphson method become aiteratively reweighted least squares (IRLS) algorithm.

IRLS form is highly useful since calculation of least squares is relatively easy.

25 / 29

Page 124: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

The model is fitted using maximum likelihood method.

Inferential goal - estimating the parameter vector β = (a, b)′.

mle of β is denoted by β - unlike linear regression no closed form expression.

mle is calculated using numerical algorithm - Fisher’s scoring algorithm.

Often the algorithm may not converge - multicollinearity, sparseness andcomplete separation.

multicollinearity : when covariate/predictor variables are linearly highlycorrelated.

sparseness : for some combinations of covariate variables we do not get anydata.

complete separation : beyond some combination threshold value only Y = 1 oronly Y = 0 responses are obtained.

For the simple logistic regression model instead of Fisher’s scoring one oftenuse Newton-Raphson method.

For the simple logistic regression model Newton-Raphson method become aiteratively reweighted least squares (IRLS) algorithm.

IRLS form is highly useful since calculation of least squares is relatively easy.

25 / 29

Page 125: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

The model is fitted using maximum likelihood method.

Inferential goal - estimating the parameter vector β = (a, b)′.

mle of β is denoted by β - unlike linear regression no closed form expression.

mle is calculated using numerical algorithm - Fisher’s scoring algorithm.

Often the algorithm may not converge - multicollinearity, sparseness andcomplete separation.

multicollinearity : when covariate/predictor variables are linearly highlycorrelated.

sparseness : for some combinations of covariate variables we do not get anydata.

complete separation : beyond some combination threshold value only Y = 1 oronly Y = 0 responses are obtained.

For the simple logistic regression model instead of Fisher’s scoring one oftenuse Newton-Raphson method.

For the simple logistic regression model Newton-Raphson method become aiteratively reweighted least squares (IRLS) algorithm.

IRLS form is highly useful since calculation of least squares is relatively easy.

25 / 29

Page 126: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

The model is fitted using maximum likelihood method.

Inferential goal - estimating the parameter vector β = (a, b)′.

mle of β is denoted by β - unlike linear regression no closed form expression.

mle is calculated using numerical algorithm - Fisher’s scoring algorithm.

Often the algorithm may not converge - multicollinearity, sparseness andcomplete separation.

multicollinearity : when covariate/predictor variables are linearly highlycorrelated.

sparseness : for some combinations of covariate variables we do not get anydata.

complete separation : beyond some combination threshold value only Y = 1 oronly Y = 0 responses are obtained.

For the simple logistic regression model instead of Fisher’s scoring one oftenuse Newton-Raphson method.

For the simple logistic regression model Newton-Raphson method become aiteratively reweighted least squares (IRLS) algorithm.

IRLS form is highly useful since calculation of least squares is relatively easy.

25 / 29

Page 127: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

The model is fitted using maximum likelihood method.

Inferential goal - estimating the parameter vector β = (a, b)′.

mle of β is denoted by β - unlike linear regression no closed form expression.

mle is calculated using numerical algorithm - Fisher’s scoring algorithm.

Often the algorithm may not converge - multicollinearity, sparseness andcomplete separation.

multicollinearity : when covariate/predictor variables are linearly highlycorrelated.

sparseness : for some combinations of covariate variables we do not get anydata.

complete separation : beyond some combination threshold value only Y = 1 oronly Y = 0 responses are obtained.

For the simple logistic regression model instead of Fisher’s scoring one oftenuse Newton-Raphson method.

For the simple logistic regression model Newton-Raphson method become aiteratively reweighted least squares (IRLS) algorithm.

IRLS form is highly useful since calculation of least squares is relatively easy.

25 / 29

Page 128: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

The model is fitted using maximum likelihood method.

Inferential goal - estimating the parameter vector β = (a, b)′.

mle of β is denoted by β - unlike linear regression no closed form expression.

mle is calculated using numerical algorithm - Fisher’s scoring algorithm.

Often the algorithm may not converge - multicollinearity, sparseness andcomplete separation.

multicollinearity : when covariate/predictor variables are linearly highlycorrelated.

sparseness : for some combinations of covariate variables we do not get anydata.

complete separation : beyond some combination threshold value only Y = 1 oronly Y = 0 responses are obtained.

For the simple logistic regression model instead of Fisher’s scoring one oftenuse Newton-Raphson method.

For the simple logistic regression model Newton-Raphson method become aiteratively reweighted least squares (IRLS) algorithm.

IRLS form is highly useful since calculation of least squares is relatively easy.

25 / 29

Page 129: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

The model is fitted using maximum likelihood method.

Inferential goal - estimating the parameter vector β = (a, b)′.

mle of β is denoted by β - unlike linear regression no closed form expression.

mle is calculated using numerical algorithm - Fisher’s scoring algorithm.

Often the algorithm may not converge - multicollinearity, sparseness andcomplete separation.

multicollinearity : when covariate/predictor variables are linearly highlycorrelated.

sparseness : for some combinations of covariate variables we do not get anydata.

complete separation : beyond some combination threshold value only Y = 1 oronly Y = 0 responses are obtained.

For the simple logistic regression model instead of Fisher’s scoring one oftenuse Newton-Raphson method.

For the simple logistic regression model Newton-Raphson method become aiteratively reweighted least squares (IRLS) algorithm.

IRLS form is highly useful since calculation of least squares is relatively easy.

25 / 29

Page 130: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

The model is fitted using maximum likelihood method.

Inferential goal - estimating the parameter vector β = (a, b)′.

mle of β is denoted by β - unlike linear regression no closed form expression.

mle is calculated using numerical algorithm - Fisher’s scoring algorithm.

Often the algorithm may not converge - multicollinearity, sparseness andcomplete separation.

multicollinearity : when covariate/predictor variables are linearly highlycorrelated.

sparseness : for some combinations of covariate variables we do not get anydata.

complete separation : beyond some combination threshold value only Y = 1 oronly Y = 0 responses are obtained.

For the simple logistic regression model instead of Fisher’s scoring one oftenuse Newton-Raphson method.

For the simple logistic regression model Newton-Raphson method become aiteratively reweighted least squares (IRLS) algorithm.

IRLS form is highly useful since calculation of least squares is relatively easy.

25 / 29

Page 131: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

Another inferential goal - testing for β.

Individual test of significance H0 : β0 = 0 vs H1 : β0 6= 0 (test of intercept).

Test statistic Z =β0

se(β0).

Finite sample null distribution is not available - asymptotic null distribution(assuming no. of data n large) of test statistic ∼ N(0, 1) - Cutoff is obtainedusing standard normal table.

Practitioners prefer p-value - P(Z > |Zobserved |) where Z ∼ N(0, 1).

Individual test of significance H0 : β1 = 0 vs H1 : β1 6= 0 (test of slope).

Test statistic Z = β1

se(β1).

asymptotic null distribution of test statistic ∼ N(0, 1) - Cutoff is obtained usingN(0, 1)-distribution table.

Asymptotically approximate confidence intervals can be obtained for theparameters β0 and β1 inverting the Z test statistics.

Goodness of fit measures.

Want something like R2.

26 / 29

Page 132: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

Another inferential goal - testing for β.

Individual test of significance H0 : β0 = 0 vs H1 : β0 6= 0 (test of intercept).

Test statistic Z =β0

se(β0).

Finite sample null distribution is not available - asymptotic null distribution(assuming no. of data n large) of test statistic ∼ N(0, 1) - Cutoff is obtainedusing standard normal table.

Practitioners prefer p-value - P(Z > |Zobserved |) where Z ∼ N(0, 1).

Individual test of significance H0 : β1 = 0 vs H1 : β1 6= 0 (test of slope).

Test statistic Z = β1

se(β1).

asymptotic null distribution of test statistic ∼ N(0, 1) - Cutoff is obtained usingN(0, 1)-distribution table.

Asymptotically approximate confidence intervals can be obtained for theparameters β0 and β1 inverting the Z test statistics.

Goodness of fit measures.

Want something like R2.

26 / 29

Page 133: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

Another inferential goal - testing for β.

Individual test of significance H0 : β0 = 0 vs H1 : β0 6= 0 (test of intercept).

Test statistic Z =β0

se(β0).

Finite sample null distribution is not available - asymptotic null distribution(assuming no. of data n large) of test statistic ∼ N(0, 1) - Cutoff is obtainedusing standard normal table.

Practitioners prefer p-value - P(Z > |Zobserved |) where Z ∼ N(0, 1).

Individual test of significance H0 : β1 = 0 vs H1 : β1 6= 0 (test of slope).

Test statistic Z = β1

se(β1).

asymptotic null distribution of test statistic ∼ N(0, 1) - Cutoff is obtained usingN(0, 1)-distribution table.

Asymptotically approximate confidence intervals can be obtained for theparameters β0 and β1 inverting the Z test statistics.

Goodness of fit measures.

Want something like R2.

26 / 29

Page 134: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

Another inferential goal - testing for β.

Individual test of significance H0 : β0 = 0 vs H1 : β0 6= 0 (test of intercept).

Test statistic Z =β0

se(β0).

Finite sample null distribution is not available - asymptotic null distribution(assuming no. of data n large) of test statistic ∼ N(0, 1) - Cutoff is obtainedusing standard normal table.

Practitioners prefer p-value - P(Z > |Zobserved |) where Z ∼ N(0, 1).

Individual test of significance H0 : β1 = 0 vs H1 : β1 6= 0 (test of slope).

Test statistic Z = β1

se(β1).

asymptotic null distribution of test statistic ∼ N(0, 1) - Cutoff is obtained usingN(0, 1)-distribution table.

Asymptotically approximate confidence intervals can be obtained for theparameters β0 and β1 inverting the Z test statistics.

Goodness of fit measures.

Want something like R2.

26 / 29

Page 135: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

Another inferential goal - testing for β.

Individual test of significance H0 : β0 = 0 vs H1 : β0 6= 0 (test of intercept).

Test statistic Z =β0

se(β0).

Finite sample null distribution is not available - asymptotic null distribution(assuming no. of data n large) of test statistic ∼ N(0, 1) - Cutoff is obtainedusing standard normal table.

Practitioners prefer p-value - P(Z > |Zobserved |) where Z ∼ N(0, 1).

Individual test of significance H0 : β1 = 0 vs H1 : β1 6= 0 (test of slope).

Test statistic Z = β1

se(β1).

asymptotic null distribution of test statistic ∼ N(0, 1) - Cutoff is obtained usingN(0, 1)-distribution table.

Asymptotically approximate confidence intervals can be obtained for theparameters β0 and β1 inverting the Z test statistics.

Goodness of fit measures.

Want something like R2.

26 / 29

Page 136: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

Another inferential goal - testing for β.

Individual test of significance H0 : β0 = 0 vs H1 : β0 6= 0 (test of intercept).

Test statistic Z =β0

se(β0).

Finite sample null distribution is not available - asymptotic null distribution(assuming no. of data n large) of test statistic ∼ N(0, 1) - Cutoff is obtainedusing standard normal table.

Practitioners prefer p-value - P(Z > |Zobserved |) where Z ∼ N(0, 1).

Individual test of significance H0 : β1 = 0 vs H1 : β1 6= 0 (test of slope).

Test statistic Z = β1

se(β1).

asymptotic null distribution of test statistic ∼ N(0, 1) - Cutoff is obtained usingN(0, 1)-distribution table.

Asymptotically approximate confidence intervals can be obtained for theparameters β0 and β1 inverting the Z test statistics.

Goodness of fit measures.

Want something like R2.

26 / 29

Page 137: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

Another inferential goal - testing for β.

Individual test of significance H0 : β0 = 0 vs H1 : β0 6= 0 (test of intercept).

Test statistic Z =β0

se(β0).

Finite sample null distribution is not available - asymptotic null distribution(assuming no. of data n large) of test statistic ∼ N(0, 1) - Cutoff is obtainedusing standard normal table.

Practitioners prefer p-value - P(Z > |Zobserved |) where Z ∼ N(0, 1).

Individual test of significance H0 : β1 = 0 vs H1 : β1 6= 0 (test of slope).

Test statistic Z = β1

se(β1).

asymptotic null distribution of test statistic ∼ N(0, 1) - Cutoff is obtained usingN(0, 1)-distribution table.

Asymptotically approximate confidence intervals can be obtained for theparameters β0 and β1 inverting the Z test statistics.

Goodness of fit measures.

Want something like R2.

26 / 29

Page 138: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

Another inferential goal - testing for β.

Individual test of significance H0 : β0 = 0 vs H1 : β0 6= 0 (test of intercept).

Test statistic Z =β0

se(β0).

Finite sample null distribution is not available - asymptotic null distribution(assuming no. of data n large) of test statistic ∼ N(0, 1) - Cutoff is obtainedusing standard normal table.

Practitioners prefer p-value - P(Z > |Zobserved |) where Z ∼ N(0, 1).

Individual test of significance H0 : β1 = 0 vs H1 : β1 6= 0 (test of slope).

Test statistic Z = β1

se(β1).

asymptotic null distribution of test statistic ∼ N(0, 1) - Cutoff is obtained usingN(0, 1)-distribution table.

Asymptotically approximate confidence intervals can be obtained for theparameters β0 and β1 inverting the Z test statistics.

Goodness of fit measures.

Want something like R2.

26 / 29

Page 139: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

Another inferential goal - testing for β.

Individual test of significance H0 : β0 = 0 vs H1 : β0 6= 0 (test of intercept).

Test statistic Z =β0

se(β0).

Finite sample null distribution is not available - asymptotic null distribution(assuming no. of data n large) of test statistic ∼ N(0, 1) - Cutoff is obtainedusing standard normal table.

Practitioners prefer p-value - P(Z > |Zobserved |) where Z ∼ N(0, 1).

Individual test of significance H0 : β1 = 0 vs H1 : β1 6= 0 (test of slope).

Test statistic Z = β1

se(β1).

asymptotic null distribution of test statistic ∼ N(0, 1) - Cutoff is obtained usingN(0, 1)-distribution table.

Asymptotically approximate confidence intervals can be obtained for theparameters β0 and β1 inverting the Z test statistics.

Goodness of fit measures.

Want something like R2.

26 / 29

Page 140: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

Another inferential goal - testing for β.

Individual test of significance H0 : β0 = 0 vs H1 : β0 6= 0 (test of intercept).

Test statistic Z =β0

se(β0).

Finite sample null distribution is not available - asymptotic null distribution(assuming no. of data n large) of test statistic ∼ N(0, 1) - Cutoff is obtainedusing standard normal table.

Practitioners prefer p-value - P(Z > |Zobserved |) where Z ∼ N(0, 1).

Individual test of significance H0 : β1 = 0 vs H1 : β1 6= 0 (test of slope).

Test statistic Z = β1

se(β1).

asymptotic null distribution of test statistic ∼ N(0, 1) - Cutoff is obtained usingN(0, 1)-distribution table.

Asymptotically approximate confidence intervals can be obtained for theparameters β0 and β1 inverting the Z test statistics.

Goodness of fit measures.

Want something like R2.

26 / 29

Page 141: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

Another inferential goal - testing for β.

Individual test of significance H0 : β0 = 0 vs H1 : β0 6= 0 (test of intercept).

Test statistic Z =β0

se(β0).

Finite sample null distribution is not available - asymptotic null distribution(assuming no. of data n large) of test statistic ∼ N(0, 1) - Cutoff is obtainedusing standard normal table.

Practitioners prefer p-value - P(Z > |Zobserved |) where Z ∼ N(0, 1).

Individual test of significance H0 : β1 = 0 vs H1 : β1 6= 0 (test of slope).

Test statistic Z = β1

se(β1).

asymptotic null distribution of test statistic ∼ N(0, 1) - Cutoff is obtained usingN(0, 1)-distribution table.

Asymptotically approximate confidence intervals can be obtained for theparameters β0 and β1 inverting the Z test statistics.

Goodness of fit measures.

Want something like R2.

26 / 29

Page 142: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

Deviance measure : Dfitted = −2 ln(L(βmle|y,X)).

Null model means only intercept term - no regressors.

Dnull = −2 ln(L(β0mle|y,X)).

Dnull − Dfitted ≥ 0.

If Dnull − Dfitted very large then we can reject the hypothesis of no regression.

H0 : all coefficients except β0 is 0 vs H1 : not H0 (test of regression is neededor not/no regressors).

This test is analogue of F-test in linear regression models.

Can construct a pseudo-R squared based on Deviance : R2L =

Dnull−DfittedDnull

R2L - larger value indicates good fit.

27 / 29

Page 143: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

Deviance measure : Dfitted = −2 ln(L(βmle|y,X)).

Null model means only intercept term - no regressors.

Dnull = −2 ln(L(β0mle|y,X)).

Dnull − Dfitted ≥ 0.

If Dnull − Dfitted very large then we can reject the hypothesis of no regression.

H0 : all coefficients except β0 is 0 vs H1 : not H0 (test of regression is neededor not/no regressors).

This test is analogue of F-test in linear regression models.

Can construct a pseudo-R squared based on Deviance : R2L =

Dnull−DfittedDnull

R2L - larger value indicates good fit.

27 / 29

Page 144: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

Deviance measure : Dfitted = −2 ln(L(βmle|y,X)).

Null model means only intercept term - no regressors.

Dnull = −2 ln(L(β0mle|y,X)).

Dnull − Dfitted ≥ 0.

If Dnull − Dfitted very large then we can reject the hypothesis of no regression.

H0 : all coefficients except β0 is 0 vs H1 : not H0 (test of regression is neededor not/no regressors).

This test is analogue of F-test in linear regression models.

Can construct a pseudo-R squared based on Deviance : R2L =

Dnull−DfittedDnull

R2L - larger value indicates good fit.

27 / 29

Page 145: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

Deviance measure : Dfitted = −2 ln(L(βmle|y,X)).

Null model means only intercept term - no regressors.

Dnull = −2 ln(L(β0mle|y,X)).

Dnull − Dfitted ≥ 0.

If Dnull − Dfitted very large then we can reject the hypothesis of no regression.

H0 : all coefficients except β0 is 0 vs H1 : not H0 (test of regression is neededor not/no regressors).

This test is analogue of F-test in linear regression models.

Can construct a pseudo-R squared based on Deviance : R2L =

Dnull−DfittedDnull

R2L - larger value indicates good fit.

27 / 29

Page 146: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

Deviance measure : Dfitted = −2 ln(L(βmle|y,X)).

Null model means only intercept term - no regressors.

Dnull = −2 ln(L(β0mle|y,X)).

Dnull − Dfitted ≥ 0.

If Dnull − Dfitted very large then we can reject the hypothesis of no regression.

H0 : all coefficients except β0 is 0 vs H1 : not H0 (test of regression is neededor not/no regressors).

This test is analogue of F-test in linear regression models.

Can construct a pseudo-R squared based on Deviance : R2L =

Dnull−DfittedDnull

R2L - larger value indicates good fit.

27 / 29

Page 147: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

Deviance measure : Dfitted = −2 ln(L(βmle|y,X)).

Null model means only intercept term - no regressors.

Dnull = −2 ln(L(β0mle|y,X)).

Dnull − Dfitted ≥ 0.

If Dnull − Dfitted very large then we can reject the hypothesis of no regression.

H0 : all coefficients except β0 is 0 vs H1 : not H0 (test of regression is neededor not/no regressors).

This test is analogue of F-test in linear regression models.

Can construct a pseudo-R squared based on Deviance : R2L =

Dnull−DfittedDnull

R2L - larger value indicates good fit.

27 / 29

Page 148: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

Deviance measure : Dfitted = −2 ln(L(βmle|y,X)).

Null model means only intercept term - no regressors.

Dnull = −2 ln(L(β0mle|y,X)).

Dnull − Dfitted ≥ 0.

If Dnull − Dfitted very large then we can reject the hypothesis of no regression.

H0 : all coefficients except β0 is 0 vs H1 : not H0 (test of regression is neededor not/no regressors).

This test is analogue of F-test in linear regression models.

Can construct a pseudo-R squared based on Deviance : R2L =

Dnull−DfittedDnull

R2L - larger value indicates good fit.

27 / 29

Page 149: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

Deviance measure : Dfitted = −2 ln(L(βmle|y,X)).

Null model means only intercept term - no regressors.

Dnull = −2 ln(L(β0mle|y,X)).

Dnull − Dfitted ≥ 0.

If Dnull − Dfitted very large then we can reject the hypothesis of no regression.

H0 : all coefficients except β0 is 0 vs H1 : not H0 (test of regression is neededor not/no regressors).

This test is analogue of F-test in linear regression models.

Can construct a pseudo-R squared based on Deviance : R2L =

Dnull−DfittedDnull

R2L - larger value indicates good fit.

27 / 29

Page 150: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

Deviance measure : Dfitted = −2 ln(L(βmle|y,X)).

Null model means only intercept term - no regressors.

Dnull = −2 ln(L(β0mle|y,X)).

Dnull − Dfitted ≥ 0.

If Dnull − Dfitted very large then we can reject the hypothesis of no regression.

H0 : all coefficients except β0 is 0 vs H1 : not H0 (test of regression is neededor not/no regressors).

This test is analogue of F-test in linear regression models.

Can construct a pseudo-R squared based on Deviance : R2L =

Dnull−DfittedDnull

R2L - larger value indicates good fit.

27 / 29

Page 151: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

Multiple logistic regression :

[Y1 = y1, · · · ,Yn = yn|X = x] ∼n∏

i=1

[eβ0+β1x1i +β2x2i +···+βpxpi

1 + eβ0+β1x1i +β2x2i +···+βpxpi]yi [

1

1 + eβ0+β1x1i +β2x2i +···+βpxpi]1−yi

Model parameters - β0, β1, β2, · · · , βp .

Everything is same as simple logistic regression.

One additional issue - multicollinearity or aliasing.

multicollinearity : some of the regressors/predictors are linearly highlycorrelated.

multicollinearity makes some estimates very unreliable!

28 / 29

Page 152: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

Multiple logistic regression :

[Y1 = y1, · · · ,Yn = yn|X = x] ∼n∏

i=1

[eβ0+β1x1i +β2x2i +···+βpxpi

1 + eβ0+β1x1i +β2x2i +···+βpxpi]yi [

1

1 + eβ0+β1x1i +β2x2i +···+βpxpi]1−yi

Model parameters - β0, β1, β2, · · · , βp .

Everything is same as simple logistic regression.

One additional issue - multicollinearity or aliasing.

multicollinearity : some of the regressors/predictors are linearly highlycorrelated.

multicollinearity makes some estimates very unreliable!

28 / 29

Page 153: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

Multiple logistic regression :

[Y1 = y1, · · · ,Yn = yn|X = x] ∼n∏

i=1

[eβ0+β1x1i +β2x2i +···+βpxpi

1 + eβ0+β1x1i +β2x2i +···+βpxpi]yi [

1

1 + eβ0+β1x1i +β2x2i +···+βpxpi]1−yi

Model parameters - β0, β1, β2, · · · , βp .

Everything is same as simple logistic regression.

One additional issue - multicollinearity or aliasing.

multicollinearity : some of the regressors/predictors are linearly highlycorrelated.

multicollinearity makes some estimates very unreliable!

28 / 29

Page 154: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

Multiple logistic regression :

[Y1 = y1, · · · ,Yn = yn|X = x] ∼n∏

i=1

[eβ0+β1x1i +β2x2i +···+βpxpi

1 + eβ0+β1x1i +β2x2i +···+βpxpi]yi [

1

1 + eβ0+β1x1i +β2x2i +···+βpxpi]1−yi

Model parameters - β0, β1, β2, · · · , βp .

Everything is same as simple logistic regression.

One additional issue - multicollinearity or aliasing.

multicollinearity : some of the regressors/predictors are linearly highlycorrelated.

multicollinearity makes some estimates very unreliable!

28 / 29

Page 155: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

Multiple logistic regression :

[Y1 = y1, · · · ,Yn = yn|X = x] ∼n∏

i=1

[eβ0+β1x1i +β2x2i +···+βpxpi

1 + eβ0+β1x1i +β2x2i +···+βpxpi]yi [

1

1 + eβ0+β1x1i +β2x2i +···+βpxpi]1−yi

Model parameters - β0, β1, β2, · · · , βp .

Everything is same as simple logistic regression.

One additional issue - multicollinearity or aliasing.

multicollinearity : some of the regressors/predictors are linearly highlycorrelated.

multicollinearity makes some estimates very unreliable!

28 / 29

Page 156: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

Multiple logistic regression :

[Y1 = y1, · · · ,Yn = yn|X = x] ∼n∏

i=1

[eβ0+β1x1i +β2x2i +···+βpxpi

1 + eβ0+β1x1i +β2x2i +···+βpxpi]yi [

1

1 + eβ0+β1x1i +β2x2i +···+βpxpi]1−yi

Model parameters - β0, β1, β2, · · · , βp .

Everything is same as simple logistic regression.

One additional issue - multicollinearity or aliasing.

multicollinearity : some of the regressors/predictors are linearly highlycorrelated.

multicollinearity makes some estimates very unreliable!

28 / 29

Page 157: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

Multiple logistic regression :

[Y1 = y1, · · · ,Yn = yn|X = x] ∼n∏

i=1

[eβ0+β1x1i +β2x2i +···+βpxpi

1 + eβ0+β1x1i +β2x2i +···+βpxpi]yi [

1

1 + eβ0+β1x1i +β2x2i +···+βpxpi]1−yi

Model parameters - β0, β1, β2, · · · , βp .

Everything is same as simple logistic regression.

One additional issue - multicollinearity or aliasing.

multicollinearity : some of the regressors/predictors are linearly highlycorrelated.

multicollinearity makes some estimates very unreliable!

28 / 29

Page 158: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

Multiple logistic regression :

[Y1 = y1, · · · ,Yn = yn|X = x] ∼n∏

i=1

[eβ0+β1x1i +β2x2i +···+βpxpi

1 + eβ0+β1x1i +β2x2i +···+βpxpi]yi [

1

1 + eβ0+β1x1i +β2x2i +···+βpxpi]1−yi

Model parameters - β0, β1, β2, · · · , βp .

Everything is same as simple logistic regression.

One additional issue - multicollinearity or aliasing.

multicollinearity : some of the regressors/predictors are linearly highlycorrelated.

multicollinearity makes some estimates very unreliable!

28 / 29

Page 159: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

Calculate variance inflation factors VIFj for each of the p regressors.

Perform a multiple linear regression of the j th covariate on the remaining(p − 1) covariates - calculate the R2

j (R-squared).

VIFj = 11−R2

j

High VIF means highly correlated covariate - VIFj > 5 is high (thumb rule).

Unlike linear regression there are different notions of residuals - Devianceresidual, Pearson residual and Anscombe residual.

Similar diagnostic plots based on them can be devised like linear regressionproblems.

29 / 29

Page 160: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

Calculate variance inflation factors VIFj for each of the p regressors.

Perform a multiple linear regression of the j th covariate on the remaining(p − 1) covariates - calculate the R2

j (R-squared).

VIFj = 11−R2

j

High VIF means highly correlated covariate - VIFj > 5 is high (thumb rule).

Unlike linear regression there are different notions of residuals - Devianceresidual, Pearson residual and Anscombe residual.

Similar diagnostic plots based on them can be devised like linear regressionproblems.

29 / 29

Page 161: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

Calculate variance inflation factors VIFj for each of the p regressors.

Perform a multiple linear regression of the j th covariate on the remaining(p − 1) covariates - calculate the R2

j (R-squared).

VIFj = 11−R2

j

High VIF means highly correlated covariate - VIFj > 5 is high (thumb rule).

Unlike linear regression there are different notions of residuals - Devianceresidual, Pearson residual and Anscombe residual.

Similar diagnostic plots based on them can be devised like linear regressionproblems.

29 / 29

Page 162: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

Calculate variance inflation factors VIFj for each of the p regressors.

Perform a multiple linear regression of the j th covariate on the remaining(p − 1) covariates - calculate the R2

j (R-squared).

VIFj = 11−R2

j

High VIF means highly correlated covariate - VIFj > 5 is high (thumb rule).

Unlike linear regression there are different notions of residuals - Devianceresidual, Pearson residual and Anscombe residual.

Similar diagnostic plots based on them can be devised like linear regressionproblems.

29 / 29

Page 163: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

Calculate variance inflation factors VIFj for each of the p regressors.

Perform a multiple linear regression of the j th covariate on the remaining(p − 1) covariates - calculate the R2

j (R-squared).

VIFj = 11−R2

j

High VIF means highly correlated covariate - VIFj > 5 is high (thumb rule).

Unlike linear regression there are different notions of residuals - Devianceresidual, Pearson residual and Anscombe residual.

Similar diagnostic plots based on them can be devised like linear regressionproblems.

29 / 29

Page 164: A Short Course in Linear and Logistic Regression...Linear Regression as Descriptive Measure Observations taken on two features - say height (x) and weight (y) of individuals. Situations

Logistic Regression Models

Calculate variance inflation factors VIFj for each of the p regressors.

Perform a multiple linear regression of the j th covariate on the remaining(p − 1) covariates - calculate the R2

j (R-squared).

VIFj = 11−R2

j

High VIF means highly correlated covariate - VIFj > 5 is high (thumb rule).

Unlike linear regression there are different notions of residuals - Devianceresidual, Pearson residual and Anscombe residual.

Similar diagnostic plots based on them can be devised like linear regressionproblems.

29 / 29