Download - FOUR METHODS OF ESTIMATING PM 2.5 ANNUAL AVERAGES Yan Liu and Amy Nail Department of Statistics North Carolina State University EPA Office of Air Quality,

FOUR METHODS OF ESTIMATING FOUR METHODS OF ESTIMATING PMPM2.52.5 ANNUAL AVERAGES ANNUAL AVERAGES

Yan Liu and Amy NailDepartment of StatisticsNorth Carolina State UniversityEPA Office of Air Quality, Planning, and StandardsEmissions Monitoring, and Analysis Division

Project ObjectivesProject Objectives

Estimation of annual average of PM2.5 concentration Estimation of standard errors associated with annual average

estimates Estimation of the probability that a site’s annual average

exceeds 15 mg/m3 At 2400 lattice points for 2000, 2001 Comparisons of 4 different methodologies:

1. Quarter-based analysis (Yan)2. Annual-based analysis (Yan) Daily-based analyses:3. Doug Nychka’s method (Bill)4. Generalized least squares in SAS Proc Mixed (Amy)

Why are Standard Errors Important?Why are Standard Errors Important?

We may estimate that the annual average for lattice point 329 is 16 mg/m3, which exceeds the standard of 15. But since our estimate has some uncertainty or standard error, we’d like to take this uncertainty into account in order to determine the probability that lattice point 329 exceeds 15.

In addition to maps like this ...In addition to maps like this ...

……we also want maps like this.we also want maps like this.

Note: This Map is WRONG--so don’t show it to anyone! We haven’t figured out the correct way to determine errors, so we cannot correctly draw a probability map yet.

Data DescriptionData Description

Concentrations of PM2.5 measured during 2000, 2001

The domain analyzed: the portion of the U.S. east of –100o longitude

Concentrations measured every third day

Map of 2400 Lattice PointsMap of 2400 Lattice Points

Method 1 – Quarterly AnalysisMethod 1 – Quarterly Analysis

3 months in each quarter • Q1(Jan. - Mar.) Q2(Apr. - Jun.)• Q3(Jul. - Sep.) Q4(Oct. - Dec.)

Within quarters, 75% completeness Found quarter mean conc. at each site For each quarter, kriged mean conc. over

lattice Averaged the quarter predictions to get

annual average estimate

Annual Average PredictionsAnnual Average Predictions

Method 2 – Annual AnalysisMethod 2 – Annual Analysis

Used sites common to all 4 quarters in quarterly analysis

Found annual mean conc. at each site

Kriged annual mean conc. over lattice

The Number of SitesThe Number of Sites

2000 2001

Quarter 1 510 631

Quarter 2 575 642

Quarter 3 619 682

Quarter 4 613 666

Annual 394 517

Model for Quarterly and Annual AnalysesModel for Quarterly and Annual Analyses

Predicted value =

quadratic surface prediction (SP)

+

error prediction (KP)

Estimating Quadratic Surface Estimating Quadratic Surface

Model: Conc = 0 + 1lat + 2lon + 3lat2 + 4lon2 + 5lat * lon +

Assume: 1) E() = 0, Var() = 2 I 2) The betas are estimated by SAS assuming errors iid

Fit parameters using ordinary least squares in

SAS proc reg

Obtained surface predictions (SP) and their standard errors (SEsp) and the ’s

Kriging the Error SurfaceKriging the Error Surface

Model: {(s) : s R2} E((s) )

= 0 Var((s) - (s’) ) = 0 if s=s’ 2

n + 2(1- e-dist/) if ss’

Estimated variogram parameters using nonlinear least squares in Splus

Obtained kriging predictions (KP) and their standard errors (SEkp)

Variogram ModelsVariogram Models

3 commonly used variogram models:– Exponential

(h)=1 – exp (-3h/a)

– Spherical (h)=1.5 • (h/a) - 0.5 • (h/a)3 if h a (h)=1 otherwise

– Gaussian (h)=1 - exp (-3h2 /a2)

a: range

h: distanceSpherical model

Exponential model

Gaussian model

range

sill

h

(h)

Cross Validation to Select Variogram ModelCross Validation to Select Variogram Model

Idea: temporarily remove the sample value at a particular location one at a time, estimate this value from remaining data using the different variogram models.

Prediction error = observed - predicted

MSE = 1/(n-1) (prediction error)2

2000

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

Q1 Q2 Q3 Q4 annual.real

MSE

2001

0

0.5

1

1.5

2

2.5

3

3.5

4

Q1 Q2 Q3 Q4 annual.real

Exponential

Gaussian

Spherical

Cross Validation MSE for Three Variogram ModelsCross Validation MSE for Three Variogram Models

• Exponential model has the least MSE.• Conclusion: use Exponential model

Calculating Predicted Annual AveragesCalculating Predicted Annual Averages

Quarter averages:

PQi = SPQi + KPQi

Annual average from quarterly analysis:

Pannual = ( PQi) / 4

Annual average from annual analysis:

Pannual = SPannual + KPannual

i=1

4

Calculation of Standard Error for Calculation of Standard Error for Annual AveragesAnnual Averages Standard errors of quarterly averages:

SEQi = (SEspi)2 + (SEkpi)2

Standard errors of annual averages from quarterly analysis:

SEannual = 1/16 (SEQi)2

Standard errors of annual averages from annual analysis:

SEannual = (SEsp)2 + (SEkp)2

Sources of ErrorSources of Error

• Less than 5% of total errors is coming from fitting a quadratic surface. • Kriging prediction error dominates.

2001

0

0.5

1

1.5

2

2.5

Surface

Kriging

Total

2000

0

0.5

1

1.5

2

2.5

Pre

dict

ion

Sta

ndar

d E

rror

Problems With Quarterly & Annual AnalysisProblems With Quarterly & Annual Analysis

The surface prediction and kriging prediction are not independent.

Var (SP + KP) Var (SP) + Var (KP)

surface prediction

krig

ing

pred

ictio

n

0 5 10 15

-20

24

Annual 2000 SP vs. KP

surface prediction

krig

ing

pred

ictio

n

4 6 8 10 12

-2-1

01

23

4

Annual 2001 SP vs. KP

surface prediction

krig

ing

pre

dic

tion

10 12 14 16

-20

24

2000 Quarter 1 SP vs. KP

surface prediction

krig

ing

pre

dic

tion

0 5 10 15

-20

24


surface prediction

krig

ing

pre

dic

tion

5 10 15

-20

24


surface prediction

krig

ing

pre

dic

tion

0 5 10 15

-4-2

02

46


surface prediction

krig

ing

pred

ictio

n

8 10 12 14 16

-6-4

-20

24


surface prediction

krig

ing

pred

ictio

n

2 4 6 8 10 12 14 16

-4-2

02

4


surface prediction

krig

ing

pred

ictio

n

0 5 10 15

-4-2

02

4


surface prediction

krig

ing

pred

ictio

n

4 6 8 10 12

-20

2


More Problems With Quarterly and Annual AnalysisMore Problems With Quarterly and Annual Analysis

Not using all available data

When kriging residuals, estimated variogram is biased low (Kim and Boos 2002) (This problem could be solved by using generalized least squares.)

Ignored standard deviation of annual and/or quarterly averages in calculation of kriging prediction error

Quarterly averages may not be independent

Methods 3 & 4 - Daily-BasedMethods 3 & 4 - Daily-Based

Used every third day data (122 days per year)

Kriged each day to obtain predictions at 2400 lattice points

At each lattice point fit a timeseries to the 122 days’ estimates to estimate annual average

Calculated timeseries error for annual average using proc arima

Method 3 - “Doug’s Method”Method 3 - “Doug’s Method” Fit a quadratic surface using the Krig function in Splus Used an algorithm that minimizes generalized cross

validation error in order to estimate all parameters--including both quadratic surface parameters and covariance parameters

Did not assume errors iid when fitting quad surf, so coefficients in quad surf estimated based on cov structure

Specified an exponential covariance structure with a nugget

Provided the fixed value of 200 km for range parameter for all 122 days

Method 4 - “Amy’s Method”Method 4 - “Amy’s Method”

Fit a quadratic surface using Generalized Least Squares in SAS Proc Mixed

Restricted (or residual) Maximum Likelihood used to estimate all parameters

Did not assume errors iid when fitting quad surf, so coefficients in quad surf estimated based on cov structure

Specified an exponential covariance structure with a nugget

Estimated each parameter each day

Problems with Doug’s MethodProblems with Doug’s Method

Using the same value for range parameter every day requires assumption that the range parameter is constant over time. Not a valid assumption. Amy’s method does not make this assumption.

Ignored kriging prediction error in calculation of timeseries error for annual average.

Problems with Amy’s MethodProblems with Amy’s Method

REML assumes data for each day is normally distributed. It isn’t. Can fix by using a transformation, but must be careful not to introduce bias in back-transform. There is an unbiased back-transform predictor and an associated estimate of error in Cressie section 3.2.2. Also must decide whether to transform each day using the same function. Doug’s method does not require normality assumption.

Ignored kriging prediction error in calculation of timeseries error for annual average.

What if we “propagate” errors?What if we “propagate” errors?

At a given lattice point we have 122 days’ worth of predictions, each with a kriging prediction error. What if we treat the 122 days as independent observations (they aren’t, they are AR1) and combine the errors accordingly? And we do this for each of our 2400 lattice points.

The Big ProblemThe Big Problem

None of our standard error estimates are correct!

They are all underestimates!We need to learn how to put spatial

error components together with temporal error components.

Model for one dayModel for one day

Yij = o + 1i + 2i2 + 3j + 4j2 + 5ij + ij

Where i = lattitude j = longitudeE(ij) = 0

Cov(ij, I’j’) = 2n + 2e-dist/ i=i’and j=j’

2e-dist/ ii’ or jj’

Model for one siteModel for one site

Yk = + (Yk-1- ) + ek k = 1,…,122

Where E(ek) = 0

Var (ek) = 2

Note: this is an AR1 model. The errors are iid (0, 2) because the temporal correlation is accounted for using the (Yk-1- ) term.

Model for all sites and days?Model for all sites and days?

Yijk = o,k + 1,ki + 2,ki2 + 3,kj + 4,kj2 + 5,kij + ijk

+ eijk

Where E(ijk ) = 0, E(eijk) = 0

We’ve assumed isotropy and stationarity for simplicity.

But how do we model Cov(ijk, I’j’k’), Cov(eijk, ei’j’k’), and Cov (ijk, ei’j’k’)?

SeparabilitySeparability

We’ve been treating the covariance structure as separable--meaning that the 1-D temporal and 2-D spatial covariance structures can be estimated separately and then can be mathematically combined to obtain a 3-D space-time covariance structure. We need to test for separability, and if the covariance components are separable, we need to appropriately combine them. We are just now learning how to do this.

Next Steps….Next Steps….

Re-do Quarterly and Annual analyses using generalized least squares

Perform Amy’s analysis using transformations, making sure to use an unbiased estimator in the back-transform and the appropriate error estimator. How much does the lack of normality in the original analysis affect results?

More next steps….More next steps….

Investigate the separability of the covariance structure and the correct method for combining space and time covariance components.

Attempt a 3-dimensional kriging. No assumption of separability is required to do this. We must, however, write our own code for this project because there is no software package (to our knowledge) that performs such an analysis. This method would allow us to use even more data than we are using now, as we would not be restricted to every third day.

That’s all, folks!That’s all, folks!

Download - FOUR METHODS OF ESTIMATING PM 2.5 ANNUAL AVERAGES Yan Liu and Amy Nail Department of Statistics North Carolina State University EPA Office of Air Quality,

Top Related