application of statistical techniques to interpretation of water monitoring data

Post on 22-Feb-2016

47 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Application of Statistical Techniques to Interpretation of Water Monitoring Data. Eric Smith, Golde Holtzman, and Carl Zipper. Outline. I. Water quality data: program design (CEZ, 15 min) II. Characteristics of water-quality data (CEZ, 15 min) III. Describing water quality(GIH, 30 min) - PowerPoint PPT Presentation

TRANSCRIPT

Application of Statistical Techniques to Interpretation

of Water Monitoring Data

Eric Smith, Golde Holtzman, and Carl Zipper

OutlineI. Water quality data: program design (CEZ, 15 min)

II. Characteristics of water-quality data (CEZ, 15 min)

III. Describing water quality(GIH, 30 min)IV. Data analysis for making decisions

A, Compliance with numerical standards (EPS, 45 min)

Dinner Break

B, Locational / temporal comparisons (“cause and effect”) (EPS, 45)

C, Detection of water-quality trends (GIH, 60 min)

III. Describing water quality(GIH, 30 min)

• Rivers and streams are an essential component of the biosphere

• Rivers are alive• Life is characterized by variation• Statistics is the science of variation• Statistical Thinking/Statistical Perspective • Thinking in terms of variation• Thinking in terms of distribution

The present problem is multivariate

• WATER QUALITY as a function of • TIME, under the influence of co-variates like• FLOW, at multiple • LOCATIONS

WQ variable versus time

Time in Years

Wat

er V

aria

ble

Bear Creek below Town of Wise STP

6.5

7

7.5

8

8.5

9

PH

1973/12/14 1978/12/14 1983/12/14 1988/12/14 1993/12/14

DATE

Univariate WQ Variable

Time

Wat

er Q

ualit

y

Univariate WQ Variable

Time

Wat

er Q

ualit

yW

ater

Qua

lity

Water Quality

Wat

er Q

ualit

y

Water Quality

Wat

er Q

ualit

yW

ater

Qua

lity

Wat

er Q

ualit

yW

ater

Qua

lity

Wat

er Q

ualit

yW

ater

Qua

lity

Wat

er Q

ualit

y

Univariate Perspective, Real Data (pH below STP)

6.5 7 7.5 8 8.5 9

6.5

7

7.5

8

8.5

9

The three most important pieces of information in a sample:

• Central Location– Mean, Median, Mode

• Dispersion– Range, Standard Deviation,

Inter Quartile Range• Shape

– Symmetry, skewness, kurtosis– No mode, unimodal, bimodal, multimodal

Central Location: Sample Mean

• (Sum of all observations) / (sample size)• Center of gravity of the distribution• depends on each observation• therefore sensitive to outliers

Central Location: Sample Mean

• (Sum of all observations) / (sample size)• Center of gravity of the distribution• depends on each observation• therefore sensitive to outliers

Central Location: Sample Mean

• (Sum of all observations) / (sample size)• Center of gravity of the distribution• depends on each observation• therefore sensitive to outliers

Central Location: Sample Mean

• (Sum of all observations) / (sample size)• Center of gravity of the distribution• depends on each observation• therefore sensitive to outliers

Central Location: Sample Mean

• (Sum of all observations) / (sample size)• Center of gravity of the distribution• depends on each observation• therefore sensitive to outliers

Central Location: Sample Mean

• (Sum of all observations) / (sample size)• Center of gravity of the distribution• depends on each observation• therefore sensitive to outliers

Central Location: Sample Median• Center of the ordered array• I.e., the (0.5)(n + 1) observation in the ordered array.

If sample size n is odd, then the

median is the middle value in the

ordered array.

Example A:

1, 1, 0, 2 , 3

Order:

0, 1, 1, 2, 3

n = 5, odd

(0.5)(n + 1) = 3

Median = 1

If sample size n is even, then the

median is the average of the two

middle values in the ordered array.

Example B:

1, 1, 0, 2, 3, 6

Order:

0, 1, 1, 2, 3, 6

n = 6, even,

(0.5)(n + 1) = 3.5

Median = (1 + 2)/2 = 1.5

Central Location: Sample Median

• Center of the ordered array• depends on the magnitude of the central

observations only• therefore NOT sensitive to outliers

Central Location: Sample Median

• Center of the ordered array• depends on the magnitude of the central

observations only• therefore NOT sensitive to outliers

Central Location: Sample Median

• Center of the ordered array• depends on the magnitude of the central

observations only• therefore NOT sensitive to outliers

Central Location: Sample Median

• Center of the ordered array• depends on the magnitude of the central

observations only• therefore NOT sensitive to outliers

Central Location: Sample Median

• Center of the ordered array• depends on the magnitude of the central

observations only• therefore NOT sensitive to outliers

Central Location: Sample Median

• Center of the ordered array• depends on the magnitude of the central

observations only• therefore NOT sensitive to outliers

Central Location: Mean vs. Median

• Mean is influenced by outliers• Median is robust against (resistant to) outliers• Mean “moves” toward outliers• Median represents bulk of observations almost

always

Comparison of mean and median tells us about outliers

Dispersion

• Range• Standard Deviation• Inter-quartile Range

Dispersion: Range• Maximum - Minimum• Easy to calculate• Easy to interpret• Depends on sample size (biased)• Therefore not good for statistical

inference

Dispersion: Standard Deviation

1

2

nYY-

0 5

-1+1

SD = 10

0 5

-2+2

SD = 2

1 2

-1 1 3

Dispersion: Properties of SD• SD > 0 for all data• SD = 0 if and only if all observations the same

(no variation)• Familiar Intervals for a normal distribution,

– 68% expected within 1 SD,– 95% expected within 2 SD,– 99.6% expected within 3 SD,– Exact for normal distribution, ballpark for any distn

• For any distribution, nearly all observations lie within 3 SD

Interpretation of SD

6.5 7 7.5 8 8.5 9

n = 200

SD = 0.41

Median = 7.6

Mean = 7.6

Quartiles, Percentiles, Quantiles, Five Number Summary, Boxplot

Maximum 4th quartile 100th percentile 1.00 quantile

3rd quartile 75th percentile 0.75 quantile

Median 2nd quartile 50th percentile 0.50 quantile

1st quartile 25th percentile 0.25 quantile

Minimum 0th quartile 0th percentile 0.00 quantile

Quartiles (undergrad classes) E.g., Sample: 0, −3.1, −0.4, 0, 2.2, 5.1, 3.8, 3.8, 3.9, 2.3, n = 10

Rank Value

10 5.1 Maximum

9 3.9

8 3.8 3rd Quartile

7 3.8

6 2.3Median 2nd Quartile

5 2.2

4 0

3 0 1st Quartile

2 −0.4

1 −3.1 Minimum

3 3.8Q

22.2 2.3 2.25

2Q

1 0Q

Max 5.1

Min 3.1

Note: Quartiles Q0, Q1, Q2, Q3, Q4, = Quantiles Q0.00, Q0.25, Q0.50, Q0.75, Q1.00

5-Number Summary and Boxplot (undergrad perspective)

Min Q1 Q2 Q3 Max

−3.10 0.00 2.25 3.80 5.10

2 2.25Median Q

5.10 3.10 8.20Range Max Min

3 1 3.80 0.00 3.80IQR Q Q

Terminology Warning:

Quartiles, a.k.a. Percentiles, a.k.a. Quantiles

Note: Quartiles Q0, Q1, Q2, Q3, Q4, = Quantiles Q0.00, Q0.25, Q0.50, Q0.75, Q1.00

Quartiles Percentiles QuantilesQ4 = 4th quartile = Max = 100th percentile = Q1.00 = 1.00 quantile

Q3 = 3rd quartile = 75th percentile = Q0.75 = 0.75 quantile

Q2 = 2nd quartile = Med = 50th percentile = Q0.50 = 0.50 quantile

Q1 = 1st quartile = 25th percentile = Q0.25 = 0.25 quantile

Q0 = 0th quartile = Min = 0th percentile = Q0.00 = 0.00 quantile

Terminology Warning:

But Percentiles and Quantiles are more general

Note: Quartiles Q0, Q1, Q2, Q3, Q4, = Quantiles Q0.00, Q0.25, Q0.50, Q0.75, Q1.00

Quartiles Percentiles QuantilesQ4 = 4th quartile = Max = 100th percentile = Q1.00 = 1.00 quantile

95th percentile = Q0.95 = 0.95 quantile

Q3 = 3rd quartile = 75th percentile = Q0.75 = 0.75 quantile

60th percentile = Q0.60 = 0.60 quantile

Q2 = 2nd quartile = Med = 50th percentile = Q0.50 = 0.50 quantile

34th percentile = Q0.34 = 0.34 quantile

Q1 = 1st quartile = 25th percentile = Q0.25 = 0.25 quantile

2.5th percentile = Q0.025 = 0.025 quantileQ0 = 0th quartile = Min = 0th percentile = Q0.00 = 0.00 quantile

Quantile Location and Quantilesby weighted averages (graduate classes)

1: Quantile Location 1

2 :

th

thq

Step q L q n

Step q Quantile Q a w b a

Example: Find the 20th percentile of the sample above.Step 1:

q = 0.20, n =10

L = 0.20(10 + 1) = 2.2

indicating the “2.2th “ observation in the ordered array.

Step 2: Therefore the 0.20 quantile is a weighted average of the 2nd and 3rd

observations in the ordered array, which are

a = − 0.4, b = 0

and the weight is

w = 0.2

Q = -0.4 + 0.2(0 – (– 0.4)) = – 0.40 + 0.08= – 0.32

E.g., Sample: 0, −3.1, −0.4, 0, 2.2, 5.1, 3.8, 3.8, 3.9, 2.3, n = 10

Quantile Location and Quantilesby weighted averages (graduate classes)

1: Quantile Location 1

2 :

th

thq

Step q L q n

Step q Quantile Q a w b a

Step 2:

a = − 0.4, b = 0, w = 0.2

Q = a + w(b – a)

= – 0.4 + 0.2(0 – (– 0.4))

= – 0.4 + 0.2(0.4)

= – 0.40 + 0.08

= – 0.32

E.g., Sample: 0, −3.1, −0.4, 0, 2.2, 5.1, 3.8, 3.8, 3.9, 2.3, n = 10

– 0.4 0

0.4

– 0.32

Quantile Location and Quantiles Example: 0, − 3.1, − 0.4, 0, 2.2, 5.1, 3.8, 3.8, 3.9, 2.3, n = 10

Value Rank

5.1 10

3.9 9

3.8 8

3.8 7

2.3 6

2.2 5

0 4

0 3

−0.4 2

−3.1 1

Quantilerank, q

Quantile Location, L Quantile, Q

Common Name

1.00 n = 10 5.1 Maximum

0.75 0.75(10+1) = 8.25

3.8+0.25(3.9 − 3.8)= 3.825 3rd Quartile

0.50 0.5(10+1) = 5.5

2.2+0.5(2.3 − 2.2)= 2.25

Median, or 2nd Quartile

0.25 0.25(10+1)=2.75

−0.4+0.75[0 − (−0.4)]= −0.1 1st Quartile

0.00 1 −3.1 Minimum

5-Number Summary and Boxplot using weighted averages for quantiles

Min Q1 Q2 Q3 Max

−3.10 −0.10 2.25 3.825 5.10

2 2.25Median Q

5.10 3.10 8.20Range Max Min

3 1 3.825 0.10 3.925IQR Q Q

Note slightly different results by using weighted averages.

Dispersion: IQRInter-Quartile Range

• (3rd Quartile - (1st Quartile)• Robust against outliers

Interpretation of IQR

6.5 7 7.5 8 8.5 9

n = 200

SD = 0.41

Median = 7.6

Mean = 7.6

IQR = 0.54

For a Normal distribution, Median 2IQR includes 99.3%

Shape: Symmetry and Skewness• Symmetry mean

bilateral symmetry

Shape: Symmetry and Skewness• Symmetry mean

bilateral symmetry

• Positive Skewness (asymmetric “tail” in positive direction)

Shape: Symmetry and Skewness• “Symmetry” mean bilateral

symmetry, skewness = 0• Mean = Median (approximately)

• Positive Skewness (asymmetric “tail” in positive direction)

• Mean > Median

• Negative Skewness (asymmetric “tail” in negative direction)

• Mean < Median

Comparison of mean and median tells us about shape

6.5 7 7.5 8 8.5 9

6.5

7

7.5

8

8.5

9

Bear Creek below Town of Wise STP

6.5

7

7.5

8

8.5

9

Outlier Box Plot Outliers

Whisker

Whisker

Median

75th %-tile = 3rd Quartile

25th %-tile = 1st Quartile

IQR

Wise, VA, below STP

6.5

7

7.5

8

8.5

9

0

2

4

6

8

1011

13

pH

TKN

mg/

l

Wise, VA below STP

102030405060708090

100110120130

0

5

10

15

20

25

DO

(% s

atur

)

BO

D (

mg/

l)

0

1

2

3

4

5

Wise, VA below STPTo

t Pho

spho

rous

(mg/

l

0

10000

20000

30000

40000

50000

60000Fecal Coliforms

top related