introduction to biostatistics (pubhlth 540) lecture 3: numerical summary measures

58
1 Introduction to Biostatistics (Pubhlth 540) Lecture 3: Numerical Summary Measures Acknowledgement: Thanks to Professor Pagano rvard School of Public Health) for lecture material

Upload: gabriel-kinney

Post on 01-Jan-2016

47 views

Category:

Documents


0 download

DESCRIPTION

Introduction to Biostatistics (Pubhlth 540) Lecture 3: Numerical Summary Measures. Acknowledgement: Thanks to Professor Pagano (Harvard School of Public Health) for lecture material. Reading/Home work. -See WEB site. For after all, what is man in nature? - PowerPoint PPT Presentation

TRANSCRIPT

1

Introduction to Biostatistics(Pubhlth 540)

Lecture 3: Numerical Summary

Measures

Acknowledgement: Thanks to Professor Pagano (Harvard School of Public Health) for lecture material

2

Reading/Home work

• -See WEB site

3

For after all, what is man in nature? A Nothing in relation to the infinite, All in relation to nothing, A central point between nothing

and all,And infinitely far from understanding

either.

Blaise Pascal, (1623-1662) Pensees (1660)

4

2.30x =

Let x represent FEV1 in liters

Example: FEV per second in 13 adolescents with asthma

5

2.30

2.15

x

x

=

=

Let x represent FEV1 in liters

Example: FEV per second in 13 adolescents with asthma

6

1

2

2.30

2.15

x

x

=

=

Let x represent FEV1 in liters

Example: FEV per second in 13 adolescents with asthma

7

1

2

3

2.30

2.15

3.50

x

x

x

=

=

=

Let x represent FEV1 in liters

Example: FEV per second in 13 adolescents with asthma

8

1

2

3

4

5

6

2.30

2.15

3.50

2.60

2.75

2.82

x

x

x

x

x

x

=

=

=

=

=

=

Let x represent FEV1 in liters

Example: FEV per second in 13 adolescents with asthma

9

1

2

3

4

5

6

2.30

2.15

3.50

2.60

2.75

2.82

x

x

x

x

x

x

=

=

=

=

=

=

Let x represent FEV1 in liters

7

8

9

10

11

12

13

4.05

2.25

2.68

3.00

4.02

2.85

3.38

x

x

x

x

x

x

x

=

=

=

=

=

=

=

Example: FEV per second in 13 adolescents with asthma

10

Measures of central tendency

•Mean

•Median

•Mode

•Population Parameters•Sample Statistics

11

Measures of central tendency

•Population Parameters

N

1

1Population Mean

N ss

x

m=

=

=

å

( )N

2

1

2

1Population Variance

N ss

x m

s=

= -

=

å

12

n

1

1Sample Mean

n ii

x=

= å

Measures of central tendency: Mean

2.3, 2.15, 3.50, 2.60, 2.75, 2.82, 4.05,2.25, 2.68, 3.00, 4.02, 2.85 (n=13)

n

1

Sum 38.35ii

x=

= =å.

Mean = x .= =38 35

2 9513

1 2Sample Numbers: , ,..., nx x x

Example: FEV per second in 13 adolescents with asthma

13

If we collect a man's urine during twenty four hours and mix all this urine to analyze the average, we get an analysis of a urine which simply does not exist; for urine when fasting, is different from urine during digestion. A startling instance of this kind was invented by a physiologist who took urine from a railroad station urinal where people of all nations passed, and who believed he could thus present an analysis of average European urine!

Claude Bernard (1813-1878)

14

Variable Mean

Mother’s age 26.4 years

Approx 4 million singleton births, 1991 :

Mean: Examples

15

Variable Mean

Mother’s age 26.4 years

Gestational age 39.15 weeks

Approx 4 million singleton births, 1991 :

Mean: Examples

16

Variable Mean

Mother’s age 26.4 years

Gestational age 39.15 weeks

Birth weight 3358.6 grams

Approx 4 million singleton births, 1991 :

Mean: Examples

17

Variable Mean

Mother’s age 26.4 years

Gestational age 39.15 weeks

Birth weight 3358.6 grams

Weight gain* 30.4 lbs

Approx 4 million singleton births, 1991 :

Mean: Examples

18

Variable Mean

Mother’s age 26.4 years

Gestational age 39.15 weeks

Birth weight 3358.6 grams

Weight gain* 30.4 lbs

Approx 4 million singleton births, 1991 :

Of 31,417 singleton births resulting in death :

Survival 49.4 days

Mean: Examples

19

years

26.4 years

Mean: Properties

20

Note what happens when one number,4.02 say, becomes large, say 40.2 :

2.3, 2.15, 3.50, 2.60, 2.75, 2.82, 4.05, 2.25, 2.68, 3.00, 40.2, 2.85

Mean = x =5.73

(versus 2.95, from before)

Mean is sensitive to every observation,it is not robust.

Mean: Properties

21

Measures of central tendency: Median

More robust, but not sensitive enough.

Definition: At least 50% of the observations are greater than or equal to the median, and at least 50% of the observations are less than or equal to the median.

2.15, 2.25, 2.30 --- median = 2.25

2.15, 2.25, 2.30, 2.60 ---12(2.25 + 2.30) = 2.275median =

22

Variable Mean Median

Mom’s age (yrs)

26.4 25

Gest. Age (wks) 39.2 39

Birth weight (gms)

3359 3374

Weight gain (lbs)

30.4 30

Survival (days) 49.4 7

Singleton births, 1991 :

Comparing mean and median

23

Mean = 3359 Median = 3374

24

25

Mean = 30.4 Median = 30

26

Mortality in the fi rst year of baby's lif e

(f or those who die in their fi rst year)

0.00

0.10

0.20

0.30

0.40

0 60 121 182 244 305

(survival days)

Prop

orti

on

27

Mortality in the first year of baby's life(for those who die in their first year)

0.00

0.00

0.01

0.10

1.00

0 31 60 91 121 152182 213 244274 305335

(survival days)

Prop

ortio

n

Mean = 49.4 Median=7

28

When to use mean or median:

Use both by all means.

Mean performs best when we have asymmetric distribution with thin tails.

If skewed, use the median.

Remember: the mean follows the tail.

Comparing mean and median

29

Mode

• Mode is defined as the observation that occurs most frequently

• When the distribution is symmetric, all three measures of central tendency are equal

30

Comparing mean, median and mode

Bimodal distribution

Modes

Mean, Median

31

•Range:•Simple to calculate•Very sensitive to extreme observations

•Inter Quartile Range (IQR) •More robust than the range

•Variance (Standard Deviation):

•Quantifies the amount of variability around the mean

Measures of spread

32

Variable Min Max Range

Mom’s age 10 49 39

Gest. Age 17 47 30

Birth weight

227 8164 7937

Weight gain

0 98 98

Survival 0 363 363

Singleton births, 1991 :

Measures of spread: Range

33

FEV1

2.30-0.65

2.15 -0.80

3.50 0.55

2.60 -0.35

2.75 -0.20

2.82 -0.13

4.05 1.10

2.25 -0.70

2.68 -0.27

3.00 0.05

4.02 1.07

2.85 -0.10

3.38 0.43

( )jx x-

Measures of spread: Variance

34

FEV1

2.30-0.65

2.15 -0.80

3.50 0.55

2.60 -0.35

2.75 -0.20

2.82 -0.13

4.05 1.10

2.25 -0.70

2.68 -0.27

3.00 0.05

4.02 1.07

2.85 -0.10

3.38 0.43

Total 0.00

( )jx x-

Measures of spread: Variance

35

FEV1

2.30-0.65

0.423

2.15 -0.80 0.640

3.50 0.55 0.303

2.60 -0.35 0.123

2.75 -0.20 0.040

2.82 -0.13 0.169

4.05 1.10 1.210

2.25 -0.70 0.490

2.68 -0.27 0.073

3.00 0.05 0.003

4.02 1.07 1.145

2.85 -0.10 0.010

3.38 0.43 0.185

Total 0.00 4.66

( )jx x- 2( )jx x-

Measures of spread: Variance

36

2

1

1Sample Variance = ( )

n-10

n

ii

x x=

-

³

å

e.g.

24.660.39liters

12= =

Measures of spread: Variance

2

1

1Population Variance = ( )

N

N

ss

x m=

37

Standard deviation = + Variance

e.g.

0.39

0.62liters

=

=

Measures of spread: Variance

Standard deviation takes on the same unit as the mean

38

Empirical Rule:

If dealing with a unimodal andsymmetric distribution, then

Mean ± 1 sd covers approx 67% obs.

Mean ± 3 sd covers approx all obs

Mean ± 2 sd covers approx 95% obs

Variance & Standard deviation

39

Mother’s age: mean = 26.4 yrs s.d. = 5.84 yrs

kleft limit

right limit

Emp.

1

Table of x ± k s.d.s

Variance & Standard deviation

40

Mother’s age: mean = 26.4 yrs s.d. = 5.84 yrs

kleft limit

right limit

Emp.

1 20.56

Table of x ± k s.d.s

Variance & Standard deviation

41

Mother’s age: mean = 26.4 yrs s.d. = 5.84 yrs

kleft limit

right limit

Emp.

1 20.56 32.24

Table of x ± k s.d.s

Variance & Standard deviation

42

Mother’s age: mean = 26.4 yrs s.d. = 5.84 yrs

kleft limit

right limit

Emp.

1 20.56 32.24 67%

Table of x ± k s.d.s

43

Mother’s age: mean = 26.4 yrs s.d. = 5.84 yrs

kleft limit

right limit

Emp.

1 20.56 32.24 67%2 14.72 38.08

Table of x ± k s.d.s

Variance & Standard deviation

44

Mother’s age: mean = 26.4 yrs s.d. = 5.84 yrs

kleft limit

right limit

Emp.

1 20.56 32.24 67%2 14.72 38.08 95%

Table of x ± k s.d.s

Variance & Standard deviation

45

Mother’s age: mean = 26.4 yrs s.d. = 5.84 yrs

kleft limit

right limit

Emp.

1 20.56 32.24 67%2 14.72 38.08 95%3 8.88 43.92

Table of x ± k s.d.s

Variance & Standard deviation

46

Mother’s age: mean = 26.4 yrs s.d. = 5.84 yrs

kleft limit

right limit

Emp.

1 20.56 32.24 67%2 14.72 38.08 95%3 8.88 43.92 all

Table of x ± k s.d.s

Characterizing a symmetric, unimodal distribution – mean,

SD

47

years20.56 32.4

Area = 0.6475

Characterizing a symmetric, unimodal distribution – mean,

SD

48

years14.72 38.08

Area = 0.963

Characterizing a symmetric, unimodal distribution – mean,

SD

49

Mother’s age: mean = 26.4 yrs s.d. = 5.84 yrs

kleft limit

right limit

Emp.

Actual

1 20.56 32.24 67% 64.75%

2 14.72 38.08 95% 96.3%

3 8.88 43.92 all 99.89%

Table of x ± k s.d.s

Characterizing a symmetric, unimodal distribution – mean,

SD

50

Chebychev’s Inequality

Table of x ± k s.d.s

Proportion is at least 1-1/k2

(true for any distribution.)

Characterizing a distribution – Chebychev’s inequality

51

Chebychev’s Inequality

k 1/k2

1 1

2 0.25

3 0.11

Table of x ± k s.d.s

Proportion is at least 1-1/k2

(true for any distribution.)

Characterizing a distribution – Chebychev’s inequality

52

Chebychev’s Inequality

k 1/k2 1-1/k2

1 1 0

2 0.25 0.75

3 0.11 0.89

Table of x ± k s.d.s

Proportion is at least 1-1/k2

(true for any distribution.)

Characterizing a distribution – Chebychev’s inequality

53

Chebychev’s Inequality

k 1/k2 1-1/k2 Emp.

1 1 0 67%2 0.25 0.75 95%3 0.11 0.89 all

Table of x ± k s.d.s

Proportion is at least 1-1/k2

(true for any distribution.)

Characterizing a distribution – Chebychev’s inequality

54

Chebychev’s Inequality

k 1/k2 1-1/k2 Emp.

Actual

1 1 0 67%64.75

%2 0.25 0.75 95% 96.3%

3 0.11 0.89 all99.89

%

Table of x ± k s.d.s

Proportion is at least 1-1/k2

(true for any distribution)

Characterizing a distribution – Chebychev’s inequality

55

Summary• Distributions can be described using:

– Measures of central tendency– Measures of dispersion

• Measures of central tendency: – Mean, Median, Mode

• Measures of dispersion: – Range, IQR, Variance, Standard Deviation

• Characterizing distributions: – Chebyshev’s inequality– Empirical rule for symmetric, unimodal

distributions

56

Questions

• In a certain real estate market, the average price of a single family home was $325,000 and the median price was $225,000. Percentiles were computed for this distribution. Is the difference between the 90th and 50th percentile likely to be bigger than, about the same as, or less than the difference between the 50th and 10th percentile? Explain briefly.

http://www.stat.berkeley.edu/users/rice/Stat2/Chapt4.pdf

57

Questions

http://www.stat.berkeley.edu/users/rice/Stat2/Chapt4.pdf

58

Questions

• 1. The average high temperature for Minneapolis is closest to (a) 45 degrees (b) 60 degrees (c) 75 degrees (d) 85 degrees

• 2. The SD of the high temperatures for Minneapolis is closest to (a) 1 degree (b) 3 degrees (c) 5 degrees (d) 20 degrees

• 3. The average high temperature for Minneapolis is --------- _the average high temperature for Belle Glade. (a) at least ten degrees less than (b) about the same as (c) at least ten degrees higher than

• 4. The average high temperature for Minneapolis is --------_the average high temperature for Olga. (a) at least ten degrees less than (b) about the same as (c) at least ten degrees higher than

• 5. The SD of the high temperatures for Minneapolis is -------- the SD of the high temperatures for Belle Glade. (a) about half of (b) about the same as (c) about twice

http://www.stat.berkeley.edu/users/rice/Stat2/Chapt4.pdf