FOUR METHODS OF ESTIMATING FOUR METHODS OF ESTIMATING PMPM2.52.5 ANNUAL AVERAGES ANNUAL AVERAGES
Yan Liu and Amy NailDepartment of StatisticsNorth Carolina State UniversityEPA Office of Air Quality, Planning, and StandardsEmissions Monitoring, and Analysis Division
Project ObjectivesProject Objectives
Estimation of annual average of PM2.5 concentration Estimation of standard errors associated with annual average
estimates Estimation of the probability that a site’s annual average
exceeds 15 mg/m3 At 2400 lattice points for 2000, 2001 Comparisons of 4 different methodologies:
1. Quarter-based analysis (Yan)2. Annual-based analysis (Yan) Daily-based analyses:3. Doug Nychka’s method (Bill)4. Generalized least squares in SAS Proc Mixed (Amy)
Why are Standard Errors Important?Why are Standard Errors Important?
We may estimate that the annual average for lattice point 329 is 16 mg/m3, which exceeds the standard of 15. But since our estimate has some uncertainty or standard error, we’d like to take this uncertainty into account in order to determine the probability that lattice point 329 exceeds 15.
In addition to maps like this ...In addition to maps like this ...
……we also want maps like this.we also want maps like this.
Note: This Map is WRONG--so don’t show it to anyone! We haven’t figured out the correct way to determine errors, so we cannot correctly draw a probability map yet.
Data DescriptionData Description
Concentrations of PM2.5 measured during 2000, 2001
The domain analyzed: the portion of the U.S. east of –100o longitude
Concentrations measured every third day
Map of 2400 Lattice PointsMap of 2400 Lattice Points
Method 1 – Quarterly AnalysisMethod 1 – Quarterly Analysis
3 months in each quarter • Q1(Jan. - Mar.) Q2(Apr. - Jun.)• Q3(Jul. - Sep.) Q4(Oct. - Dec.)
Within quarters, 75% completeness Found quarter mean conc. at each site For each quarter, kriged mean conc. over
lattice Averaged the quarter predictions to get
annual average estimate
Annual Average PredictionsAnnual Average Predictions
Method 2 – Annual AnalysisMethod 2 – Annual Analysis
Used sites common to all 4 quarters in quarterly analysis
Found annual mean conc. at each site
Kriged annual mean conc. over lattice
The Number of SitesThe Number of Sites
2000 2001
Quarter 1 510 631
Quarter 2 575 642
Quarter 3 619 682
Quarter 4 613 666
Annual 394 517
Model for Quarterly and Annual AnalysesModel for Quarterly and Annual Analyses
Predicted value =
quadratic surface prediction (SP)
+
error prediction (KP)
Estimating Quadratic Surface Estimating Quadratic Surface
Model: Conc = 0 + 1lat + 2lon + 3lat2 + 4lon2 + 5lat * lon +
Assume: 1) E() = 0, Var() = 2 I 2) The betas are estimated by SAS assuming errors iid
Fit parameters using ordinary least squares in
SAS proc reg
Obtained surface predictions (SP) and their standard errors (SEsp) and the ’s
Kriging the Error SurfaceKriging the Error Surface
Model: {(s) : s R2} E((s) )
= 0 Var((s) - (s’) ) = 0 if s=s’ 2
n + 2(1- e-dist/) if ss’
Estimated variogram parameters using nonlinear least squares in Splus
Obtained kriging predictions (KP) and their standard errors (SEkp)
Variogram ModelsVariogram Models
3 commonly used variogram models:– Exponential
(h)=1 – exp (-3h/a)
– Spherical (h)=1.5 • (h/a) - 0.5 • (h/a)3 if h a (h)=1 otherwise
– Gaussian (h)=1 - exp (-3h2 /a2)
a: range
h: distanceSpherical model
Exponential model
Gaussian model
range
sill
h
(h)
Cross Validation to Select Variogram ModelCross Validation to Select Variogram Model
Idea: temporarily remove the sample value at a particular location one at a time, estimate this value from remaining data using the different variogram models.
Prediction error = observed - predicted
MSE = 1/(n-1) (prediction error)2
2000
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Q1 Q2 Q3 Q4 annual.real
MSE
2001
0
0.5
1
1.5
2
2.5
3
3.5
4
Q1 Q2 Q3 Q4 annual.real
Exponential
Gaussian
Spherical
Cross Validation MSE for Three Variogram ModelsCross Validation MSE for Three Variogram Models
• Exponential model has the least MSE.• Conclusion: use Exponential model
Calculating Predicted Annual AveragesCalculating Predicted Annual Averages
Quarter averages:
PQi = SPQi + KPQi
Annual average from quarterly analysis:
Pannual = ( PQi) / 4
Annual average from annual analysis:
Pannual = SPannual + KPannual
i=1
4
Calculation of Standard Error for Calculation of Standard Error for Annual AveragesAnnual Averages Standard errors of quarterly averages:
SEQi = (SEspi)2 + (SEkpi)2
Standard errors of annual averages from quarterly analysis:
SEannual = 1/16 (SEQi)2
Standard errors of annual averages from annual analysis:
SEannual = (SEsp)2 + (SEkp)2
Sources of ErrorSources of Error
• Less than 5% of total errors is coming from fitting a quadratic surface. • Kriging prediction error dominates.
2001
0
0.5
1
1.5
2
2.5
Surface
Kriging
Total
2000
0
0.5
1
1.5
2
2.5
Pre
dict
ion
Sta
ndar
d E
rror
Problems With Quarterly & Annual AnalysisProblems With Quarterly & Annual Analysis
The surface prediction and kriging prediction are not independent.
Var (SP + KP) Var (SP) + Var (KP)
surface prediction
krig
ing
pred
ictio
n
0 5 10 15
-20
24
Annual 2000 SP vs. KP
surface prediction
krig
ing
pred
ictio
n
4 6 8 10 12
-2-1
01
23
4
Annual 2001 SP vs. KP
surface prediction
krig
ing
pre
dic
tion
10 12 14 16
-20
24
2000 Quarter 1 SP vs. KP
surface prediction
krig
ing
pre
dic
tion
0 5 10 15
-20
24
2000 Quarter 2 SP vs. KP
surface prediction
krig
ing
pre
dic
tion
5 10 15
-20
24
2000 Quarter 3 SP vs. KP
surface prediction
krig
ing
pre
dic
tion
0 5 10 15
-4-2
02
46
2000 Quarter 4 SP vs. KP
surface prediction
krig
ing
pred
ictio
n
8 10 12 14 16
-6-4
-20
24
2001 Quarter 1 SP vs. KP
surface prediction
krig
ing
pred
ictio
n
2 4 6 8 10 12 14 16
-4-2
02
4
2001 Quarter 2 SP vs. KP
surface prediction
krig
ing
pred
ictio
n
0 5 10 15
-4-2
02
4
2001 Quarter 3 SP vs. KP
surface prediction
krig
ing
pred
ictio
n
4 6 8 10 12
-20
2
2001 Quarter 4 SP vs. KP
More Problems With Quarterly and Annual AnalysisMore Problems With Quarterly and Annual Analysis
Not using all available data
When kriging residuals, estimated variogram is biased low (Kim and Boos 2002) (This problem could be solved by using generalized least squares.)
Ignored standard deviation of annual and/or quarterly averages in calculation of kriging prediction error
Quarterly averages may not be independent
Methods 3 & 4 - Daily-BasedMethods 3 & 4 - Daily-Based
Used every third day data (122 days per year)
Kriged each day to obtain predictions at 2400 lattice points
At each lattice point fit a timeseries to the 122 days’ estimates to estimate annual average
Calculated timeseries error for annual average using proc arima
Method 3 - “Doug’s Method”Method 3 - “Doug’s Method” Fit a quadratic surface using the Krig function in Splus Used an algorithm that minimizes generalized cross
validation error in order to estimate all parameters--including both quadratic surface parameters and covariance parameters
Did not assume errors iid when fitting quad surf, so coefficients in quad surf estimated based on cov structure
Specified an exponential covariance structure with a nugget
Provided the fixed value of 200 km for range parameter for all 122 days
Method 4 - “Amy’s Method”Method 4 - “Amy’s Method”
Fit a quadratic surface using Generalized Least Squares in SAS Proc Mixed
Restricted (or residual) Maximum Likelihood used to estimate all parameters
Did not assume errors iid when fitting quad surf, so coefficients in quad surf estimated based on cov structure
Specified an exponential covariance structure with a nugget
Estimated each parameter each day
Problems with Doug’s MethodProblems with Doug’s Method
Using the same value for range parameter every day requires assumption that the range parameter is constant over time. Not a valid assumption. Amy’s method does not make this assumption.
Ignored kriging prediction error in calculation of timeseries error for annual average.
Problems with Amy’s MethodProblems with Amy’s Method
REML assumes data for each day is normally distributed. It isn’t. Can fix by using a transformation, but must be careful not to introduce bias in back-transform. There is an unbiased back-transform predictor and an associated estimate of error in Cressie section 3.2.2. Also must decide whether to transform each day using the same function. Doug’s method does not require normality assumption.
Ignored kriging prediction error in calculation of timeseries error for annual average.
What if we “propagate” errors?What if we “propagate” errors?
At a given lattice point we have 122 days’ worth of predictions, each with a kriging prediction error. What if we treat the 122 days as independent observations (they aren’t, they are AR1) and combine the errors accordingly? And we do this for each of our 2400 lattice points.
The Big ProblemThe Big Problem
None of our standard error estimates are correct!
They are all underestimates!We need to learn how to put spatial
error components together with temporal error components.
Model for one dayModel for one day
Yij = o + 1i + 2i2 + 3j + 4j2 + 5ij + ij
Where i = lattitude j = longitudeE(ij) = 0
Cov(ij, I’j’) = 2n + 2e-dist/ i=i’and j=j’
2e-dist/ ii’ or jj’
Model for one siteModel for one site
Yk = + (Yk-1- ) + ek k = 1,…,122
Where E(ek) = 0
Var (ek) = 2
Note: this is an AR1 model. The errors are iid (0, 2) because the temporal correlation is accounted for using the (Yk-1- ) term.
Model for all sites and days?Model for all sites and days?
Yijk = o,k + 1,ki + 2,ki2 + 3,kj + 4,kj2 + 5,kij + ijk
+ eijk
Where E(ijk ) = 0, E(eijk) = 0
We’ve assumed isotropy and stationarity for simplicity.
But how do we model Cov(ijk, I’j’k’), Cov(eijk, ei’j’k’), and Cov (ijk, ei’j’k’)?
SeparabilitySeparability
We’ve been treating the covariance structure as separable--meaning that the 1-D temporal and 2-D spatial covariance structures can be estimated separately and then can be mathematically combined to obtain a 3-D space-time covariance structure. We need to test for separability, and if the covariance components are separable, we need to appropriately combine them. We are just now learning how to do this.
Next Steps….Next Steps….
Re-do Quarterly and Annual analyses using generalized least squares
Perform Amy’s analysis using transformations, making sure to use an unbiased estimator in the back-transform and the appropriate error estimator. How much does the lack of normality in the original analysis affect results?
More next steps….More next steps….
Investigate the separability of the covariance structure and the correct method for combining space and time covariance components.
Attempt a 3-dimensional kriging. No assumption of separability is required to do this. We must, however, write our own code for this project because there is no software package (to our knowledge) that performs such an analysis. This method would allow us to use even more data than we are using now, as we would not be restricted to every third day.
That’s all, folks!That’s all, folks!