what you will learn

49
Primer on Statistics for Interventional Cardiologists Giuseppe Sangiorgi, MD Pierfrancesco Agostoni, MD Giuseppe Biondi-Zoccai, MD

Upload: brandon-baird

Post on 30-Dec-2015

27 views

Category:

Documents


3 download

DESCRIPTION

Primer on Statistics for Interventional Cardiologists Giuseppe Sangiorgi, MD Pierfrancesco Agostoni, MD Giuseppe Biondi-Zoccai, MD. What you will learn. Introduction Basics Descriptive statistics Probability distributions Inferential statistics - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: What you will learn

Primer on Statistics for Interventional

Cardiologists

Giuseppe Sangiorgi, MDPierfrancesco Agostoni, MDGiuseppe Biondi-Zoccai, MD

Page 2: What you will learn

What you will learn• Introduction

• Basics

• Descriptive statistics

• Probability distributions

• Inferential statistics

• Finding differences in mean between two groups

• Finding differences in mean between more than 2 groups

• Linear regression and correlation for bivariate analysis

• Analysis of categorical data (contingency tables)

• Analysis of time-to-event data (survival analysis)

• Advanced statistics at a glance

• Conclusions and take home messages

Page 3: What you will learn

What you will learn

• Linear regression and correlation for bivariate analysis– Simple linear regression– Regression diagnostics– Correlation analysis– Non-parametric alternatives: Spearman rho

Page 4: What you will learn

How can I assess the quantitative impact of dilation pressure during stenting on final minimum lumen diameter?

In other words, can I quantitatively predict the change in a dependent variable given specific changes in an independent variable

Regression

Page 5: What you will learn

0

50

100

150

200

250

300

350

400

0 10 20 30 40 50 60

Min

imum

lum

en d

iam

eter

(m

m)

Dilation pressure during stenting (ATM)

Beforehand plotting is pivotal

Page 6: What you will learn

• We cannot define a specific mathematical function (eg F=m*a): there is no precise relationship

• Regression means a relationship which is not very precise, where a given value of the independent variable corresponds to a distribution of values of the dependent variable

Regression

Page 7: What you will learn

Regression analysis

• It models a continuous dependent variable and a continuous independent variable

• The dependent variable in the regression equation is modeled as a function of the independent variable, a corresponding parameter (constant), and an error term (a random variable representing unexplained variation in the dependent variable)

• Parameters are estimated so as to give a "best fit" of the data, by means of the least squares method

Page 8: What you will learn

Linear regression

Independent variable

Distribution of the dependent variable

Average of the distribution of values of the dependent variable

Regression line

Page 9: What you will learn

• Through regression I can estimate the average value of the dependent variable given a specific value of the independent variable

• To do it, I need a specific model:

MLD = costant + β * dilation pressure

where β is the angular coefficient and shows the change in Y (MLD) given a unit change of X (dilation pressure)

• β is the parameter to assess, in order to appraise whether it is different from zero (ie if MLD steadily changes given a change in dilation pressure)

• How can we estimate β?

Linear regression

Page 10: What you will learn

It can be intuitively understood that it is the line that minimizes the differences between observed values (yi) and estimated values (yi’)

0

50

100

150

200

250

300

350

400

0 10 20 30 40 50 60

y

y'

Which of these different possible lines that I can graphically trace and compute

is the best regression line?

Linear regression

Page 11: What you will learn

• Linear regression analysis computes a statistical test to assess whether the coefficient of the independent variable is significantly different from zero

• If the test has a probability value lower than the critical value (p<0.050), the regression model is valid

Linear regression

Page 12: What you will learn

0

50

100

150

200

250

300

350

400

0 10 20 30 40 50 600

50

100

150

200

250

300

350

400

450

0 5 10 15 20 25

0

50

100

150

200

250

300

350

400

0 5 10 15 20 25

Linear regression: different models and precisions

Page 13: What you will learn

The relationship between differences after squaring and further mathematical passage becomes:

Total deviance = Residual deviance + Regression deviance

The ratio (R2) can be used to testthe statistical significance of the regression model, ie the null hypothesis that β equals zero

.Re..Re.sDevgrDev

Linear regression

Page 14: What you will learn

• The best index of regression accuracy is the coefficient of determination: R2

• It varies between 0 (no accuracy) and 1.0 (perfect accuracy)

• In other words, R2 express the % of variability of the dependent variable which can be solely and directly explained by variations in the independent variable

• Beware of R2>0.90 in biology, in most cases they are fraudulent

Linear regression

Page 15: What you will learn

The difference between observed values and estimated values can be defined by:

)_

'()_

()'( yiyyiyiyiy

y = 0.5199x + 222.51

R2 = 0.0022

0

50

100

150

200

250

300

350

400

0 10 20 30 40 50 60

y'

y

media y

Linear regression

Page 16: What you will learn

Regression

Mauri et al, Circulation 2005

Page 17: What you will learn

Regression

Mauri et al, Circulation 2005

Page 18: What you will learn

Regression

Mauri et al, Circulation 2005

Page 19: What you will learn

What you will learn

• Linear regression and correlation for bivariate analysis– Simple linear regression– Regression diagnostics– Correlation analysis– Non-parametric alternatives: Spearman rho

Page 20: What you will learn

Regression diagnostics• Once a regression model has been constructed, it may

be important to confirm the goodness of fit of the model and its statistical significance

• Common checks of goodness of fit are: R2, analyses of the pattern of residuals (must be randomly and normally distributed, and have non-constant variance) and hypothesis testing

• Statistical significance can be checked by an F-test of the overall fit, followed by t-tests of individual parameters

• Interpretations of these diagnostic tests rest heavily on the model assumptions

Page 21: What you will learn

Regression diagnostics• Although examination of the residuals can be used to

invalidate a model, the results of a t-test or F-test are sometimes more difficult to interpret if the model's assumptions are violated

• If the error term does not have a normal distribution, in small samples the estimated parameters will not follow normal distributions, which complicates inference

• With relatively large samples, however, the central limit theorem (CLT) can be invoked such that hypothesis testing may proceed using asymptotic approximations

Page 22: What you will learn

Residuals• Residuals are the differences between the

predicted values of Y at each value of X• They should be randomly and normally

distributed, without any apparent trend or curvature

• The plot of the residuals against X provides a visual assessment of the distribution of the residuals – this distribution should appear random (Crawley’s “sky at night”) if the model reasonably predicts the trend in Y

Page 23: What you will learn

Residual plots

Page 24: What you will learn

Residual plots

Page 25: What you will learn

Residual plots

Page 26: What you will learn

Checklist for linear regressionTo check that linear regression is an appropriate analysis for

these data, ask yourself these questions

•Q1: Can the relationship between X and Y be graphed as a straight line? In many experiments the relationship between X and Y is curved, making linear regression inappropriate. Either transform the data, or use a program that can perform nonlinear curve fitting

•Q2: Is the scatter of data around the line Gaussian (at least approximately)?   Linear regression analysis assumes that the scatter is Gaussian

•Q3: Is the variability the same everywhere? Linear regression assumes that scatter of points around the best-fit line has the same standard deviation all along the curve. The assumption is violated if the points with high or low X values tend to be further from the best-fit line. The assumption that the standard deviation is the same everywhere is termed homoscedasticity

Page 27: What you will learn

Checklist for linear regression• Q4: Do you know the X values precisely? The linear regression model

assumes that X values are exactly correct, and that experimental error or biological variability only affects the Y values. This is rarely the case, but it is sufficient to assume that any imprecision in measuring X is very small compared to the variability in Y.

• Q5: Are the data points independent? Whether one point is above or below the line is a matter of chance, and does not influence whether another point is above or below the line.

• Q6: Are the X and Y values intertwined? If the value of X is used to calculate Y (or the value of Y is used to calculate X) then linear regression calculations are invalid. One example would be a graph of midterm LVEF (X) vs. long-term LVEF (Y). Since the midterm exam LVEF is a component of the final LVEF, linear regression is not valid for these data

Page 28: What you will learn

• More than one independent variable can be included in the model, yielding a multiple linear regression model:

Y = a + β1X1 + β2X2 + β3X3 + ….

• Statistical analysis can even simultaneously appraise the quantitative contribution of each β!

Multiple linear regression

Page 29: What you will learn

What you will learn

• Linear regression and correlation for bivariate analysis– Simple linear regression– Regression diagnostics– Correlation analysis– Non-parametric alternatives: Spearman rho

Page 30: What you will learn

Correlation• The square root of the coefficient

of determination (R2) is the correlation coefficient (R) and shows the degree of linear association between 2 continuous variables, but disregards causation

• Assumes values between -1.0 (negative association), 0 (no association), and +1.0 (positive association)

• It can be summarized as a point summary estimate, with specific standard error, 95% confidence interval, and p value

K. Pearson

Page 31: What you will learn

Regression and correlation

Briguori et al, Eur Heart J 2002

Page 32: What you will learn

Regression and correlation

Briguori et al, Eur Heart J 2002

Page 33: What you will learn

Correlation

Escolar et al, AJC 2007

Page 34: What you will learn

Correlation

Escolar et al, AJC 2007

Page 35: What you will learn

What about non-linear associations?

Each number correspond to the correlation coefficient for linear association (R)

Page 36: What you will learn

Dangers of not plotting data

Four sets of data all with the same R=0.81

Page 37: What you will learn

What you will learn

• Linear regression and correlation for bivariate analysis– Simple linear regression– Regression diagnostics– Correlation analysis– Non-parametric alternatives: Spearman rho

Page 38: What you will learn

Pearson vs Spearman• Whenever the independent and dependent

variables can be assumed to belong to normal distributions, the Pearson linear correlation method can be used, maximizing statistical power and yield

• Whenever the data are sparse, rare, and/or not belonging to normal distributions, the non-parametric Spearman correlation method should be used, which yields the rank correlation coefficient (rho), but not its R2

C. Spearman

Page 39: What you will learn

Spearman rho

Abbate et al, JACC 2003

Page 40: What you will learn

Spearman rho

Abbate et al, JACC 2003

Page 41: What you will learn

Regression and correlation:do-it-yourself with SPSS

Page 42: What you will learn

Linear regression

Page 43: What you will learn

Linear regression

Page 44: What you will learn

Linear regression

Page 45: What you will learn

Scatterplot

Page 46: What you will learn

Correlation

Page 47: What you will learn

Correlation

Page 48: What you will learn

Correlation

Page 49: What you will learn

Thank you for your attention

For any correspondence: [email protected]

For further slides on these topics feel free to visit the metcardio.org website:

http://www.metcardio.org/slides.html