introductory statistics for laboratorians dealing with high throughput data sets centers for disease...

23
Introductory Statistics for Laboratorians dealing with High Throughput Data sets Centers for Disease Control

Upload: kristen-mott

Post on 14-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introductory Statistics for Laboratorians dealing with High Throughput Data sets Centers for Disease Control

Introductory Statistics for Laboratorians dealing with High

Throughput Data sets

Centers for Disease Control

Page 2: Introductory Statistics for Laboratorians dealing with High Throughput Data sets Centers for Disease Control

Graphing the Relationship between Two Variables

• Problem 19• Graph the following on the axis provided• Write the equation for the line

X – Scale Y – Scale

2 5

4 9

4 9

5 11

7 15

8 17

Page 3: Introductory Statistics for Laboratorians dealing with High Throughput Data sets Centers for Disease Control

Problem 19

Page 4: Introductory Statistics for Laboratorians dealing with High Throughput Data sets Centers for Disease Control

Problem 19

Page 5: Introductory Statistics for Laboratorians dealing with High Throughput Data sets Centers for Disease Control

Problem 20

• Graph the following on the axis provided• Write the equation for the line

X – Scale Y – Scale

2 17

3 15

5 11

6 9

6 9

8 5

Page 6: Introductory Statistics for Laboratorians dealing with High Throughput Data sets Centers for Disease Control

Problem 20

Page 7: Introductory Statistics for Laboratorians dealing with High Throughput Data sets Centers for Disease Control

Problem 20

Page 8: Introductory Statistics for Laboratorians dealing with High Throughput Data sets Centers for Disease Control

Problem 21• Graph the following on the axis provided• Describe the relationship between X and Y• If we had theoretical reasons to believe the relationship is a straight line,

what could account for the variability (error).X – Scale Y – Scale

1 2

2 3

2 2

3 5

4 5

5 7

6 5

8 6

9 7

10 8

Page 9: Introductory Statistics for Laboratorians dealing with High Throughput Data sets Centers for Disease Control

Problem 21

Page 10: Introductory Statistics for Laboratorians dealing with High Throughput Data sets Centers for Disease Control

Problem 21

Page 11: Introductory Statistics for Laboratorians dealing with High Throughput Data sets Centers for Disease Control

Problem 22

• Graph the following on the axis provided• Describe the relationship between X and Y

X – Scale Y – Scale2 8

3 7

3 8

4 5

5 4

5 5

5 3

7 5

8 3

8 2

Page 12: Introductory Statistics for Laboratorians dealing with High Throughput Data sets Centers for Disease Control

Problem 22

Page 13: Introductory Statistics for Laboratorians dealing with High Throughput Data sets Centers for Disease Control

Problem 22

Page 14: Introductory Statistics for Laboratorians dealing with High Throughput Data sets Centers for Disease Control

Problem 23

• Graph the following on the axis provided• Describe the relationship between X and Y

X – Scale Y – Scale0 32 83 54 64 55 45 48 79 110 7

Page 15: Introductory Statistics for Laboratorians dealing with High Throughput Data sets Centers for Disease Control

Problem 23

Page 16: Introductory Statistics for Laboratorians dealing with High Throughput Data sets Centers for Disease Control

Problem 23

Page 17: Introductory Statistics for Laboratorians dealing with High Throughput Data sets Centers for Disease Control

Pearson Correlation Coefficient

• We need a way to quantify how correlated two variables are.

• Pearson invented the correlation coefficient

• Ranges from -1 to +1• Perfect Positive Correlation = +1• Perfect Negative Correlation = -1

N

ZZr yx

Page 18: Introductory Statistics for Laboratorians dealing with High Throughput Data sets Centers for Disease Control

Pearson’s r

Page 19: Introductory Statistics for Laboratorians dealing with High Throughput Data sets Centers for Disease Control
Page 20: Introductory Statistics for Laboratorians dealing with High Throughput Data sets Centers for Disease Control
Page 21: Introductory Statistics for Laboratorians dealing with High Throughput Data sets Centers for Disease Control

Testing the Significance of Pearson’s r

• The null hypothesis is usually that the correlation between X and Y is zero (no relationship, nothing is happening).

• You have to know the degrees of freedom then the computer can look up the probability that the correlation is zero (could result from chance alone).

• If that probability is less than your chosen alpha, you reject the null hypothesis.

Page 22: Introductory Statistics for Laboratorians dealing with High Throughput Data sets Centers for Disease Control

Correlation MatrixCorrelation matrix for five variables

Variable 1 2 3 4 5

1 1.00 .29 .68 .05 .17

2 .29 1.00 .44 .22 .03

3 .68 .44 1.00 .39 .12

4 .05 .22 .39 1.00 .41

5 .17 .03 .12 .41 1.00

Page 23: Introductory Statistics for Laboratorians dealing with High Throughput Data sets Centers for Disease Control

Factor AnalysisCorrelations between Variable V1 through V5 Showing Two Underlying Factors

V1 V2 V3 V4 V5

V1 1.00V2 .80 1.00

V3 .90 .88 1.00

V4 .20 .15 .20 1.00

V5 .10 .05 .10 .90 1.00

V1, V2, and V3 are highly correlated with each other and nearly uncorrelated with V4 and V5

V4 and v5 are highly correlated with each other and nearly uncorrelated with V1 – V3.

Factor Analysis is a technique that identifies this sort of pattern in correlation a correlation matrix.