biostatistics course part 15 correlation dr. sc. nicolas padilla raygoza department of nursing and...

20
Biostatistics course Part 15 Correlation Dr. Sc. Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division Health Sciences and Engineering University of Guanajuato Campus Celaya-Salvatierra

Upload: abner-mcgee

Post on 28-Dec-2015

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Biostatistics course Part 15 Correlation Dr. Sc. Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division Health Sciences and Engineering

Biostatistics coursePart 15

Correlation

Dr. Sc. Nicolas Padilla RaygozaDepartment of Nursing and Obstetrics

Division Health Sciences and EngineeringUniversity of Guanajuato

Campus Celaya-Salvatierra

Page 2: Biostatistics course Part 15 Correlation Dr. Sc. Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division Health Sciences and Engineering

Biosketch

Medical Doctor by University Autonomous of Guadalajara. Pediatrician by the Mexican Council of Certification on

Pediatrics. Postgraduate Diploma on Epidemiology, London School of

Hygiene and Tropical Medicine, University of London. Master Sciences with aim in Epidemiology, Atlantic International

University. Doctorate Sciences with aim in Epidemiology, Atlantic

International University. Associated Professor B, Department of Nursing and Obstetrics,

Division of Health Sciences and Engineering, University of Guanajuato, Campus Celaya Salvatierra, Mexico.

[email protected]

Page 3: Biostatistics course Part 15 Correlation Dr. Sc. Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division Health Sciences and Engineering

Competencies

The reader will know how to relationate two quantitative variables.

He (she) will know hos show two quantitative variables.

He (she) will apply r Pearson to measure the relationship between two quantitative varibles.

Page 4: Biostatistics course Part 15 Correlation Dr. Sc. Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division Health Sciences and Engineering

Introduction

There are two reasons why examine the relationship between two quantitative variables. Do the variable values trend to be higher o

lesser to higher values of another variable? What is the value from a variable when we

know the value of the second variable? To evaluate the degree of association

between two quantitative variables, we use correlation.

Page 5: Biostatistics course Part 15 Correlation Dr. Sc. Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division Health Sciences and Engineering

Introduction

Correlation is using to study a possible linear association (right line) between two quantitative variables. It say us how many is associated the two variables.

First, we see how show the data and then quantify the strength of association between two quantitative variables.

Page 6: Biostatistics course Part 15 Correlation Dr. Sc. Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division Health Sciences and Engineering

Showing the relation

A single and effective form to examine the relation between two quantitative variables is using a scattered points graph.

Each point correspond at one subject.

Page 7: Biostatistics course Part 15 Correlation Dr. Sc. Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division Health Sciences and Engineering

Showing the relation

From the graph, do we can say that there is an association between age and systolic blood pressure in these women? Yes, there seems to be an increase in systolic

blood pressure, as age of women is higher. For each woman the age and systolic blood

pressure values are using as coordinates in the graph.

If you count the number of points, the add is 40; one point for each woman.

Page 8: Biostatistics course Part 15 Correlation Dr. Sc. Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division Health Sciences and Engineering

Showing the relation

The graph shows the relations between hemoglobin levels and age from 15 women.

For each women, the measures of age and hemoglobin are used as coordinates in the graph.

Page 9: Biostatistics course Part 15 Correlation Dr. Sc. Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division Health Sciences and Engineering

Showing the relation

To find the values of x and y for a woman, plot a vertical line and an horizontal line until the cross.

When specifically, we want to see if hemoglobin change with age:

Age is the explicative variable for hemoglobin (independent or exposure) Hemoglobin is the response variable (dependent or outcome)

Page 10: Biostatistics course Part 15 Correlation Dr. Sc. Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division Health Sciences and Engineering

Correlation

When look a scatter plot, we have an idea if there is an association between two quantitative variables.

To measure the degree of association, we calculate the coefficient of correlation.

Standard method is the correlation coefficient of Pearson, r.

Page 11: Biostatistics course Part 15 Correlation Dr. Sc. Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division Health Sciences and Engineering

Correlation

By looking the graph of scattered points, we have an idea of whether there is an association between two numerical variables.

To measure the degree of association, we calculate a coefficient of correlation.

The standard method is the coefficient of correlation of Pearson, r.

Page 12: Biostatistics course Part 15 Correlation Dr. Sc. Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division Health Sciences and Engineering

Coefficient of correlation of Pearson, r Measures the dispersion of points around an underlying linear trend

(straight line). It can take any value between - 1 and +1. The formula is: Ʃ(x-x)(y-y)

r= ----------------

√Ʃ(x-x)2 (y-y)2

Page 13: Biostatistics course Part 15 Correlation Dr. Sc. Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division Health Sciences and Engineering

Coefficient of correlation of Pearson, r

10 20 30 40 50

1

2

3

4

5

A

Distance of point A from mean of X

Distance from point A of mean of Y

Page 14: Biostatistics course Part 15 Correlation Dr. Sc. Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division Health Sciences and Engineering

Coefficient of correlation of Pearson, r

r= +1 r= -1

Page 15: Biostatistics course Part 15 Correlation Dr. Sc. Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division Health Sciences and Engineering

Correlation

If there is a nonlinear relationship, the correlation is zero.

But be careful, when r = 0, may have a strong linear relationship between two variables.

Always examine the data graphically first

Page 16: Biostatistics course Part 15 Correlation Dr. Sc. Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division Health Sciences and Engineering

Assumptions from correlation

A coefficient of correlation can be calculate in all dataset. It is more significative when the two variables have a Normal

distribution. Data of this kind, will have a elliptical distribution.

Another assumption to use correlation, is that all observations should be independents, meaning that only one observation for each variable should come from each individual in the study.

Page 17: Biostatistics course Part 15 Correlation Dr. Sc. Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division Health Sciences and Engineering

Interpretation of correlation

Coefficient of correlation should be between -1 and +1. A value of +1 show a positive perfect correlation. A value of – 1 show a negative perfect correlation. A value of 0 show that there is not correlation between the two variables.

A high correlation can show a weak relationship when it is examined in a scatter plot.

A 0 correlation does not always indicate non-relationship, because it can be non-linear.

Page 18: Biostatistics course Part 15 Correlation Dr. Sc. Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division Health Sciences and Engineering

Showing correlation

There are three points to remember: Data should be showed in a scatter plot graph. Coefficient of correlation, r, should be given with two

decimals. The number of observations should be showed.

Page 19: Biostatistics course Part 15 Correlation Dr. Sc. Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division Health Sciences and Engineering

Showing correlation

10 ciudadesr= 0.89

Page 20: Biostatistics course Part 15 Correlation Dr. Sc. Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division Health Sciences and Engineering

Bibliography

1.- Last JM. A dictionary of epidemiology. New York, 4ª ed. Oxford University Press, 2001:173.

2.- Kirkwood BR. Essentials of medical ststistics. Oxford, Blackwell Science, 1988: 1-4.

3.- Altman DG. Practical statistics for medical research. Boca Ratón, Chapman & Hall/ CRC; 1991: 1-9.