regression analysis linear regression logistic regression

Regression analysis

Linear regression Logistic regression

Relationship and association

Straight line

95 95.5 96 96.5 97 97.5 98 98.5 9921.523

21.5235

21.524

21.5245

21.525

21.5255

21.526

21.5265

H ip (cm )

-0.0008BM

XbbY 10

XBMI 0008.01000

121 XX

onintersecti0 b

HIPBMI 10 bb

Best straight line?

Best straight line!

90 92 94 96 98 100 102 104 106 10814

(X1,Y1)

11 YYe

iii YYe

Least square estimation

Simple linear regression

1. Is the association linear?

-3 -2 -1 0 1 2 3-4

1. Is the association linear?2. Describe the

association: what is b0 and b1

BMI = -12.6kg/m2+0.35kg/m3*Hip

XbYb 10

1. Is the association linear?2. Describe the association3. Is the slope significantly

different from 0?Help SPSS!!!

Coefficientsa

Unstandardized Coefficients

Standardized

Coefficients

t Sig.B Std. Error Beta

1 (Constant) -12,581 2,331 -5,396 ,000

Hip ,345 ,023 ,565 15,266 ,000

a. Dependent Variable: BMI

1. Is the association linear?2. Describe the association3. Is the slope significantly

different from 0?4. How good is the fit?

How far are the data points fom the line on avarage?

The Correlation Coefficient, r

R = 0.7

R = -0.5

r2 – Goodness of fitHow much of the variation can be explained by the model?

R2 = 0

R2 = 1

R2 = 0.5

R2 = 0.2

Multiple linear regression

Could waist measure descirbe some of the variation in BMI?BMI =1.3 kg/m2 + 0.42 kg/m3 * WaistOr even better:

WSTHIPBMI 210 bbb

0.17WST0.25HIP12.2- BMI

Multiple linear regression

Adding age: adj R2 = 0.352

Adding thigh: adj R2 = 0.352?

Coefficientsa

Unstandardized

Coefficients

Standardized

Coefficients

t Sig.

95,0% Confidence Interval

B Std. Error Beta Lower Bound Upper Bound

1 (Constant) -9,001 2,449 -3,676 ,000 -13,813 -4,190

Waist ,168 ,043 ,201 3,923 ,000 ,084 ,252

Hip ,252 ,031 ,411 8,012 ,000 ,190 ,313

Age -,064 ,018 -,126 -3,492 ,001 -,101 -,028

Coefficientsa

Unstandardized

Coefficients

Standardized

Coefficients

t Sig.

95,0% Confidence Interval

B Std. Error Beta Lower Bound Upper Bound

1 (Constant) 3,581 1,784 2,007 ,045 ,075 7,086

Waist ,168 ,043 ,201 3,923 ,000 ,084 ,252

Age -,064 ,018 -,126 -3,492 ,001 -,101 -,028

Thigh ,252 ,031 ,411 8,012 ,000 ,190 ,313

Assumptions

1. Dependent variable must be metric continuous

2. Independent must be continuous or ordinal

3. Linear relationship between dependent and all independent variables

4. Residuals must have a constant spread.

5. Residuals are normal distributed6. Independent variables are not

perfectly correlated with each other

Non-parametric correlation

Ranked Correlation

Kendall’s Spearman’s rs

Korrelation koefficienten er mellem -1 og 1. Hvor -1 er perfekt omvendt korrelation, 0 betyder ingen korrelation,

og 1 betyder perfekt korrelation.

Pearson is the correlation method for normal dataRemember the assumptions:1. Dependent variable must be metric continuous2. Independent must be continuous or ordinal3. Linear relationship between dependent and all independent

variables4. Residuals must have a constant spread.5. Residuals are normal distributed

Kendall’s - Et eksempel

Spearman – det samme eksempel

d2 1 4 9 1 1 1 9 9 1 16

0.68481010

Korrelation i SPSS

Correlations

a Pearson

Correlation

1 ,685*

Sig. (2-tailed) ,029

N 10 10

b Pearson

Correlation

,685* 1

Sig. (2-tailed) ,029

N 10 10

*. Correlation is significant at the 0.05 level (2-tailed).

Correlations

Kendall's tau_b a Correlation

Coefficient

1,000 ,511*

Sig. (2-tailed) . ,040

N 10 10

b Correlation

Coefficient

,511* 1,000

Sig. (2-tailed) ,040 .

N 10 10

Spearman's rho a Correlation

Coefficient

1,000 ,685*

Sig. (2-tailed) . ,029

N 10 10

b Correlation

Coefficient

,685* 1,000

Sig. (2-tailed) ,029 .

N 10 10

*. Correlation is significant at the 0.05 level (2-tailed).

Logistic regression

Logistic Regression

• If the dependent variable is categorical and especially binary?

• Use some interpolation method

• Linear regression cannot help us.

The sigmodal curve

1 e...

-6 -4 -2 0 2 4 60

sigmodal curve

0 = 0;

The sigmodal curve

• The intercept basically just ‘scale’ the input variable

1 e...

-6 -4 -2 0 2 4 60

sigmodal curve

0 = 0;

0 = 2;

0 = -2;

The sigmodal curve

1 e...

-6 -4 -2 0 2 4 60

sigmodal curve

0 = 0;

1 = 0.5

• Large regression coefficient → risk factor strongly influences the probability

The sigmodal curve

1 e...

-6 -4 -2 0 2 4 60

sigmodal curve

0 = 0;

1 = -1

• Large regression coefficient → risk factor strongly influences the probability

• Positive regression coefficient → risk factor increases the probability

• Logistic regession uses maximum likelihood estimation, not least square estimation

Does age influence the diagnosis? Continuous independent variable

Variables in the Equation

B S.E. Wald df Sig. Exp(B)

95% C.I.for EXP(B)

Lower Upper

Step 1a Age ,109 ,010 108,745 1 ,000 1,115 1,092 1,138

Constant -4,213 ,423 99,097 1 ,000 ,015

a. Variable(s) entered on step 1: Age.

Does previous intake of OCP influence the diagnosis? Categorical independent variable

95% C.I.for EXP(B)

Lower Upper

Step 1a OCP(1) -,311 ,180 2,979 1 ,084 ,733 ,515 1,043

Constant ,233 ,123 3,583 1 ,058 1,263

a. Variable(s) entered on step 1: OCP.

0.48051

1)1( 1, OCP If

0.55801

1)1( 0, OCP If

311.0233.01

Odds ratio

0.7327 ratio odds 311.01010

e BBBBB

Multiple logistic regression

95% C.I.for EXP(B)

Lower Upper

Step 1a Age ,123 ,011 115,343 1 ,000 1,131 1,106 1,157

BMI ,083 ,019 18,732 1 ,000 1,087 1,046 1,128

OCP ,528 ,219 5,808 1 ,016 1,695 1,104 2,603

Constant -6,974 ,762 83,777 1 ,000 ,001

a. Variable(s) entered on step 1: Age, BMI, OCP.

BMIageOCP1

BBBBze

Predicting the diagnosis by logistic regression

What is the probability that the tumor of a 50 year old woman who has been using OCP and has a BMI of 26 is malignant?

z = -6.974 + 0.123*50 + 0.083*26 + 0.28*1 = 1.6140p = 1/(1+e-1.6140) = 0.8340

95% C.I.for EXP(B)

Lower Upper

Step 1a Age ,123 ,011 115,343 1 ,000 1,131 1,106 1,157

BMI ,083 ,019 18,732 1 ,000 1,087 1,046 1,128

OCP ,528 ,219 5,808 1 ,016 1,695 1,104 2,603

Constant -6,974 ,762 83,777 1 ,000 ,001

a. Variable(s) entered on step 1: Age, BMI, OCP.

regression analysis linear regression logistic regression

Documents

linear and logistic regression

ece595 / stat598: machine learning i lecture 14 logistic...

lecture 14 multiple linear regression and logistic...

introduction to logistic regression. content simple and...

lecture 3 - linear and logistic regression 2011 (2) [read...

maximum likelihood estimation of logistic regression ... ·...

review: logistic regression, gaussian naïve bayes, linear...

11 linear and quadratic discriminant analysis, logistic...

binary logistic regression - juan battlebinary logistic...

logistic regression linear, polynomial,...

streamsvm - linear svms and logistic regression...

logistic regression for distribution modeling - clas...

logistic regression - san francisco state · pdf...

chapter 8 – logistic...

stat3612 lecture 3 generalized linear modelsgeneralized...

logistic regression - cs60010: deep learninglogistics agenda...

iv. multiple logistic regression extend simple logistic...

multilevel logistic regression · 2020. 12. 5. · logistic...

linear regression, logistic regression, and generalized...

compare linear classiﬁers lda logistic regression linear...