1 of 37 key concepts underlying dqos and vsp dqo training course day 1 module 4 (60 minutes) (75...

37
1 of 37 Key Concepts Underlying DQOs and VSP DQO Training Course Day 1 Module 4 (60 minutes) (75 minute lunch break) Presenter: Sebastian Tindall

Upload: kevin-sims

Post on 03-Jan-2016

218 views

Category:

Documents


1 download

TRANSCRIPT

1 of 37

Key Concepts Underlying DQOs and VSP

DQO Training Course Day 1

Module 4

(60 minutes)(75 minute lunch break)

Presenter: Sebastian Tindall

2 of 37

Key Points

Have fun while learning key statistical concepts using hands-on illustrations

This module prepares the way for a more in-depth look at the DQO Process and the use of VSP

3 of 37

TheBigPicture

Decision Error

Sampling Cost

Remediation Cost

Health Risk

Waste Disposal

CostCompliance

Schedule

4 of 37

Our Focus

Sampling Cost

$ $

UnnecessaryDisposaland/or

CleanupCost

5 of 37

Balance in Sampling Design

The statistician’s aim in designing surveys and experiments is to meet a desired degree of reliability at the lowest possible cost under the existing budgetary, administrative, and physical limitations within which the work must be conducted. In other words, the aim is efficiency--the most information (smallest error) for the money.

Some Theory of Sampling,

Deming, W.E., 1950

6 of 37

Our Methodology:Use Hands-On Illustrations of...

Basic statistical concepts needed for VSP and the DQO Process

Using...Visual Sample

Plan

7 of 37

Our Methodology:Use Hands-On Illustrations of...

Basic statistical concepts needed for VSP and the DQO Process

Using Coin flips– Pennies

Demo #1 Demo #2

– Quarter

8 of 37

How Many SamplesShould We Take?

5?

50?

9 of 37

How Many Times Should I Flip a Coin Before I Decide it is

Contaminated (Biased Tails)?

One tail, 50% Six tails, 1.6%

Two tails, 25% Seven tails, 0.8%

Three tails, 12.5% Eight tails, 0.4%

Four tails, 6% Nine tails, 0.2%

Five tails, 3% Ten tails, 0.1%

10 of 37

Football Field

One-AcreFootball Field

30'0"

11 of 37

Example Problem A 1-acre field was contaminated with mill

tailings in the 1960s Cleanup standard:

– “The mean 226Ra concentration in the upper 6” of soil must be less than 6.0 pCi/g.”

There is a good chance that actual mean 226Ra concentration is between 4.0 and 6.0 pCi/g

12 of 37

Example Problem (cont.)

Historical data suggest a standard deviation of 1.6 pCi/g

It costs $1000 to collect, process, and analyze one sample

The maximum sampling budget is $5,000

13 of 37

Chance of

Deciding Site is Dirty

1.0

0.5

0.0

6 pCi/g

Action Level

Low True Mean 226Ra Concentration High

Ideal Rule

Graph of Perfect Decision Making

14 of 37

Chance of

Deciding Site is Dirty

1.0

0.5

0.0

6 pCi/g

Action Level

Low True Mean 226Ra Concentration High

Typical Curve

Graph of Typical Decision Making

15 of 37

Marbles

9Black

8Blue

7Dark Yellow

6Red

5Green

4White

33ClearClear

Ra-226, pCi/gColor

16 of 37

Simplified Decision Process

Take some number of samples Find the average 226Ra concentration in our

samples If we pass the appropriate QA/G-9 test, decide

the site is clean If we fail the appropriate QA/G-9 test, decide

the site is dirty

17 of 37

Example of Ad Hoc Sampling Design and the Results

Suppose we choose to take 5 samples for various reasons: low cost, tradition, convenience, etc.

Need volunteer to do the sampling Need volunteer to record results We will follow QA/G-9 One-Sample t-Test

directions using an Excel spreadsheet

18 of 37

One-Sample t-Test Equation from EPA’s Practical Methods

for Data Analysis, QA/G-9

Calculated t = (sample mean - AL) ------------------------ std. dev/sqrt(n)

If calculated t is less than table value, decide site is clean

19 of 37

True Mean 226Ra Concentration

Action Level

X

2 3 4 5 6 7 8

X

X

X

4 - 6 = -2

5 - 6 = -1

Comparing UCL to Action Level is Like Student’s t-Test

7 - 6 = 1

8 - 6 = 2

UCL = 4

UCL = 5

UCL = 7

UCL = 8

20 of 37

21 of 37

Key Concepts Defined

Latin Letters Concepts Greek Letters ConceptsN population size

(population unit)n number of samples

sample mean is astatistic

population mean is astatistical parameter

s sample standarddeviation is a statistic

population standarddeviation is a statisticalparameter

H0 null hypothesis(action level)

alpha error rate

beta error rate width of the gray region

x

22 of 37

Learn the Jargon

• t-test• UCL - upper

confidence limit• AL - action level• N - target population• n - population units

sampled - population mean

• x - sample mean - population

standard deviation• s - sample standard

deviation

• H0 - null hypothesis

- alpha error rate - beta error rate - width of gray

region

23 of 37

t-testCalculated t = (sample mean - AL)

------------------------

If calculated t is less than table value, decide site is clean

) /s( n

24 of 37

Upper Confidence Limit, UCLFor a 95% UCL and assuming sufficient n:If you repeatedly calculate 95% UCLs for many independent random sampling events, in the long run, you would be correct 95% of the time in claiming that the true mean is less than or equal to your UCLs.

Note: Different s will produce different UCLs

)]s/(*t[ df,1 nUCL X

X

25 of 37

Upper Confidence Limit, UCL

More commonly, but some experts dislike: For a single UCL, you are 95% confident that the true mean is less than or equal to your calculated UCL.

(See Hahn and Meeker in Statistical Intervals A Guide for Practitioners, p. 31).

26 of 37

Action Level

A measurement threshold value of the Population Parameter (e.g., true mean) that provides the criterion for choosing among alternative actions.

27 of 37

NTarget Population: The set of N population units about which inferences will be made

Population Units: The N objects (environmental units) that make up the target or sampled population

nThe number of population units selected and measured is n

28 of 37

10 x 10 FieldPopulation = All 100 Population Units

29 of 37

10 x 10 FieldPopulation = All 100 Population Units

Sample = 5 Population Units

1.5

1.5

2.3

1.7

1.9

30 of 37

Population Mean

The average of all N population units

i = 1

N

XiN

1

Sample Mean

The average of the n population units actually measuredX

n

1 n

i = 1

XiX

31 of 37

Population Standard Deviation

The average deviation of all N population units from the population mean

N

Xi

N

i

2

1

Sample Standard Deviations

The “average” deviation of the n measured units from the sample mean

1

2

1

n

XXs

i

n

i

32 of 37

The Null HypothesisH0

The initial assumption about how the true mean relates to the action level

Example: The site is dirty. (We’ll assume this for the rest of this

discussion)

0H : Action Level

33 of 37

The Alternate HypothesisHA

The alternative hypothesis isaccepted only when there is

overwhelming proof that the Null condition is false.

H : Action LevelA

34 of 37

The Alpha Error Rate (Type 1, False +)

The chance of deciding that a dirty site is clean when the true mean is equal to the action level

The Beta Error Rate (Type 2, False -)

The chance of deciding a clean site is dirty when the true mean is equal to the lower bound of the

gray region (LBGR)

(Null Hypothesis = Site is Dirty)

35 of 37

The Width of Gray Region

AL - =

Gray Region = AL - LBGR

The lower bound of the gray region ()

is defined as the hypothetical true mean concentration where the site should be declared

clean with a reasonably high probability

36 of 37

Decisions about population parameters, such as the true mean, , and the true standard deviation, , are based on statistics such as the sample mean, , and the sample standard deviation, s. Since these decisions are based on incomplete information, they can be in error.

Summary

X

37 of 37

End of Module 4

Thank you

Questions?

We will now take a 75 minute lunch break.

Please be back at 1:00 pm.