center for environmental research and technology/air quality modeling university of california at...
TRANSCRIPT
Center for Environmental Research and Technology/Air Quality Modeling
University of California at Riverside
Model Performance Metrics, Ambient Data Sets and Evaluation Tools
USEPA PM Model Evaluation Workshop, RTP, NC February 9-10, 2004
Gail Tonnesen, Chao-Jung Chien, Bo Wang Youjun Qin, Zion Wang, Tiegang Cao
Center for Environmental Research and Technology/Air Quality Modeling
University of California at Riverside
Acknowledgments
• Funding from the Western Regional Air Partnership Modeling Forum and VISTAS.
• Assistance from EPA and others in gaining access to ambient data.
• 12km Plots and analysis from Jim Boylan at State of Georgia
Center for Environmental Research and Technology/Air Quality Modeling
University of California at Riverside
Outline
• UCR Model Evaluation Software – Problems we had to solve
• Choice of metrics for clean conditions.
• Judging performance for high resolution nested domains.
Center for Environmental Research and Technology/Air Quality Modeling
University of California at Riverside
Motivation
• Needed to evaluate model performance for WRAP annual regional haze modeling:– Required a very large number of sites, and days.
– For several different ambient monitoring networks
• Evaluation would be repeated many times:– Many iterations on the “base case”
– Several model sensitivity/diagnostic cases to evaluate
• Limited time and resources were available to complete the evaluation.
Center for Environmental Research and Technology/Air Quality Modeling
University of California at Riverside
Solution
• Develop model evaluation software to:– Compute 17 statistical metrics for model evaluation.– Generate graphical plots in a variety of formats:– Scatter Plots
• all sites for one month
• All sites for full year
• One site for all days
• One day for all sites
– Time series for each site
Center for Environmental Research and Technology/Air Quality Modeling
University of California at Riverside
Ambient Monitoring Networks
• IMPROVE (The Interagency Monitoring of Protected Visual Environments)
• CASTNET (Clean Air Status and Trend Network)• EPA’s AQS (Air Quality System) database• EPA’s STN (Speciation Trends Network)• NADP (National Atmospheric Deposition Program)• SEARCH daily & hourly data• PAMS (Photochemical Assessment Monitoring
Stations)• PM Supersites.
Center for Environmental Research and Technology/Air Quality Modeling
University of California at Riverside
Number of Sites Evaluated by Network
No. Sites in Continental US Ambient Network
1999 2002
AQS 1532 1557
CASTNET 74 76
IMPROVE 61 134
NADP 133 176
STN none 64
SEARCH 8 8
Center for Environmental Research and Technology/Air Quality Modeling
University of California at Riverside
Overlap Among Monitoring Networks
PM25, PM10
O3, SO2
PM25, PM10
O3, NOx, CO, Pb, etc
EPA PM Sites
Other monitoring stations from state, local agencies
O3, NOxVOCs
Sp. PM25, Visibility
HNO3, NO3, SO4,
Center for Environmental Research and Technology/Air Quality Modeling
University of California at Riverside
Species Mapping
• Specify how to compare model with data for each network.
• Unique species mapping for each air quality model.
Center for Environmental Research and Technology/Air Quality Modeling
University of California at Riverside
Model vs. Obs. Species Mapping TableCompound IMPROVE SEARCH STN CMAQ Mapping
SO4 SO4 PCM1_SO4 M_SO4 ASO4J + ASO4I
NO3 NO3 PCM1_NO3 M_NO3 ANO3J + ANO3I
NH40.375*SO4 + 0.29*NO3
PCM1_NH4 M_NH4 ANH4J + ANH4I
OC1.4*(OC1+OC2+OC3+OC4+OP)
1.4*PCM3_OC + 1.4*SAF*BackupPCM3_OC
OCM_adj AORGAJ + AORGAI + AORGPAJ + AORGPAI + AORGBJ + AORGBI
ECEC1+EC2 + EC3-OP
PCM3_EC EC_NIOSH AECJ + AECI
SOIL2.2*Al + 2.49*Si + 1.63*Ca + 2.42*Fe + 1.94*Ti
PM25_MajorMetalOxides
Crustal A25I +A25J
CM MT – FM ACORS + ASEAS + ASOIL
PM25a FM TEOM_Masspm2_5frm or pm2_5mass
ASO4J + ASO4I + ANO3J + ANO3I + ANH4J + ANH4I + AORGAJ + AORGAI + AORGPAJ + AORGPAI + AORGBJ + AORGBI + AECJ + AECI + A25J + A25I
PM10 MTASO4J + ASO4I + ANO3J + ANO3I + ANH4J + ANH4I + AORGAJ + AORGAI + AORGPAJ + AORGPAI + AORGBJ + AORGBI + AECJ + AECI + A25J + A25I + ACORS + ASEAS + ASOIL
Bext_Recon(1/Mm)
10b + 3*f(RH)c(1.375*SO4 + 1.29*NO3) + 4*OC + 10*EC + SOIL + 0.6*CM
10b + 3*f(RH)c[1.375*(ASO4J + ASO4I) + 1.29*(ANO3J + ANO3I)] + 4*1.4*(AORGAJ + AORGAI + AORGPAJ + AORGPAI + AORGBJ + AORGBI) + 10*(AECJ + AECI) + 1*(A25J + A25I) + 0.6*(ACORS + ASEAS + ASOIL)
Center for Environmental Research and Technology/Air Quality Modeling
University of California at Riverside
Gaseous compounds, wet deposition, and others
Compound AQS NADP CASTNet CMAQ Mapping
O3, ppmv O3 O3
CO, ppmv CO CO
NO2, ppmv NO2 NO2
SO2, ppmv SO2 SO2
SO2, ug/m3 Total_SO2 2211.5*DENS*SO2
HNO3, ug/m3 NHNO3 2176.9*DENS*HNO3
Total_NO3, ug/m3 Total_NO3 ANO3J + ANO3I + 0.9841*2211.5*DENS*HNO3
SO4_wdep, kg/ha WSO4 ASO4J + ASO4I (from WDEP1)
NO3_wdep, kg/ha WNO3 ANO3J + ANO3I (from WDEP1)
NH4_wdep, kg/ha WNH4 ANH4J + ANH4I (from WDEP1)
Center for Environmental Research and Technology/Air Quality Modeling
University of California at Riverside
• No EPA guidance available for PM.
• Everyone has their personal favorite metric.
• Several metrics are non-symmetric about zero causing over predictions to be exaggerated compared to under-predictions.
• Is coefficient of determination (R2) a useful metric?
Recommended Performance Metrics?
Center for Environmental Research and Technology/Air Quality Modeling
University of California at Riverside
Statistical measures used in model performance evaluation
Measure Mathematical Expression
Notation
Accuracy of unpaired peak (Au)Opeak = peak observation; Pupeak= unpaired peak prediction within 2 grid cells of peak observation site
Accuracy of paired peak (Ap)P = paired in time and space peak prediction
Coefficient of determination
Pi = prediction at time and location i;
Oi =observation at time and location i;
=arithmetic average of Pi, i=1,2,…, N;
=arithmetic average of Oi, i=1,2,…,N
Normalized Mean Error (NME) Reported as %
Root Mean Square Error (RMSE)
Mean Absolute Gross Error (MAGE)
peak
peakupeak
O
OP
peak
peak
O
OP
N
i
N
iii
N
iii
OOPP
OOPP
1 1
22
2
1
)()(
))((
N
ii
N
iii
O
OP
1
1
2
1
1
21
N
iii OP
N
PO
N
iii OP
N 1
1
Center for Environmental Research and Technology/Air Quality Modeling
University of California at Riverside
Measure Mathematical Expression Notation
Fractional Gross Error (FE) Reported as %
Mean Normalized Gross Error (MNGE)
Reported as %
Mean Bias (MB)
Mean Normalized Bias (MNB) Reported as %
Mean Fractionalized Bias (Fractional Bias, MFB)
Reported as %
Normalized Mean Bias (NMB) Reported as %
Bias Factor (BF) Bias Factor = 1 + MNB; Reported as ratio notation (prediction : observation)
N
i i
ii
O
OP
N 1
1
N
iii OP
N 1
1
N
i i
ii
O
OP
N 1
1
N
i ii
ii
OP
OP
N 1
2
N
ii
N
iii
O
OP
1
1
)(
N
i i
i
O
P
N 1
1
N
i ii
ii
OP
OP
N 1
2
Center for Environmental Research and Technology/Air Quality Modeling
University of California at Riverside
• Mean Normalized Bias (MNB) from -100% to + inf.
• Normalized Mean Bias (NMB) from -100% to + inf.
• Fractional Bias (FB) from –200% to +200%
• Fractional Error (FE) from 0% to +200%
• Bias Factor (Knipping ratio) is MNB + 1, reported as a ratio, for example:– 4:1 for over prediction – 1:4 for under-prediction.
Most Used Metrics
Center for Environmental Research and Technology/Air Quality Modeling
University of California at Riverside
UCR Java-based AQM Evaluation Tools
Center for Environmental Research and Technology/Air Quality Modeling
University of California at Riverside
UCR Java-based AQM Evaluation Tools
Center for Environmental Research and Technology/Air Quality Modeling
University of California at Riverside
SAPRC99 vs. CB4NO3; IMPROVE
FE% FB%
SAPRC99 108.4 49.1
CB4 107.4 45.6
CB4-2002 109.2 52.0
cross comparisons
Center for Environmental Research and Technology/Air Quality Modeling
University of California at Riverside
SAPRC99 vs. CB4SO4; IMPROVE
FE% FB%
SAPRC99 54.9 9.4
CB4 56.0 10.2
CB4-2002 56.5 12.4
cross comparisons
Center for Environmental Research and Technology/Air Quality Modeling
University of California at Riverside
Time series plot for CMAQ vs. CAMx at SEARCH site – JST (Jefferson St.)
Center for Environmental Research and Technology/Air Quality Modeling
University of California at Riverside
1 With 60 ppb ambient cutoff 2Using 3*elemental sulfur 3No data available in WRAP domain 4Measurements available at 3 sites
FE(%) FB (%) FE(%) FB (%) FE(%) FB (%) FE(%) FB (%)
O3 AQS1 26.32 -16.26 37.16 -36.52 34.82 -32.42 48.92 -48.92
SO2 CASTNET 66.52 26.09 80.54 70.14 69.88 -39.63 59.67 24.09
IMPROVE2 49.58 -10.84 67.38 47.26 52.85 -22.05 80.42 68.40
CASTNET 38.54 -11.20 36.33 28.19 62.31 -54.03 57.63 50.35
SEARCH 54.51 31.57 50.54 25.03
SEARCH_H 71.49 35.62 69.93 26.67
STN 46.59 6.44 39.47 9.71 51.65 -33.60 45.04 8.69
IMPROVE 129.67 -86.78 102.62 49.02 135.52 -109.90 101.78 46.52
CASTNET 112.85 -19.28 95.59 78.50 134.70 -116.21 93.74 76.74
SEARCH 105.18 -58.60 107.74 67.38
SEARCH_H 140.47 -96.02 130.47 36.11
STN 99.65 -42.43 77.79 9.24 109.78 -88.25 80.83 -49.04
HNO3 CASTNET 54.11 -10.78 68.48 -23.35 79.98 -66.59 60.48 -8.91
Total_NO3 CASTNET 60.12 -14.82 54.38 44.77 89.94 -76.34 50.72 34.24
IMPROVE4 58.81 37.68 88.15 71.80
CASTNET 42.92 -6.08 71.50 68.83 59.20 -42.36 83.67 77.82
SEARCH 42.10 -1.87 68.95 48.21
SEARCH_H 69.00 30.89 101.48 60.03
STN 54.27 27.27 65.99 37.38 56.03 2.45 78.82 13.48
NA
NA
NA3
NA
SO4
NH4
US WRAPSpecies Network SUMMER SUMMER WINTERWINTER
NO3
Center for Environmental Research and Technology/Air Quality Modeling
University of California at Riverside
Viewing Spatial Patterns
• Problem: Model performance metrics and time-series plots do not identify cases where the model is “off by one grid cell”.
• Process ambient data in the I/O API format so that data can be compared to model using PAVE.
Center for Environmental Research and Technology/Air Quality Modeling
University of California at Riverside
IMPROVE SO4, Jan 5
Center for Environmental Research and Technology/Air Quality Modeling
University of California at Riverside
IMPROVE SO4, June 10
Center for Environmental Research and Technology/Air Quality Modeling
University of California at Riverside
IMPROVE NO3, Jan 5
Center for Environmental Research and Technology/Air Quality Modeling
University of California at Riverside
IMPROVE NO3, July 1
Center for Environmental Research and Technology/Air Quality Modeling
University of California at Riverside
IMPROVE SOA, Jan 5
Center for Environmental Research and Technology/Air Quality Modeling
University of California at Riverside
IMPROVE SOA, June 25
Center for Environmental Research and Technology/Air Quality Modeling
University of California at Riverside
Spatially Weighted Metrics
• PAVE plots qualitatively indicate error relative to spatial patterns, but do we also need to quantify this?– Wind error of 30 degrees can cause model to
miss peak by one or more grid cells.– Interpolate model using surrounding grid cells?– Use average of adjacent grid cells?– Within what distance?
Center for Environmental Research and Technology/Air Quality Modeling
University of California at Riverside
Judging Model Performance
• Many plots and metrics – but what is the bottom line?
• Need to stratify the data for model evaluation– Evaluate seasonal performance.– Group by related types of sites.– Judge model for each site or similar groups.– How best to group or stratify sites?
• Want to avoid wasting time analyzing plots and metrics that are not useful.
Center for Environmental Research and Technology/Air Quality Modeling
University of California at Riverside
12km vs. 36km, Winter SO4
FB%
36km -35
12km -39
Center for Environmental Research and Technology/Air Quality Modeling
University of California at Riverside
FB%
36km -34
12km -13
12km vs. 36km, Winter NO3
Center for Environmental Research and Technology/Air Quality Modeling
University of California at Riverside
Recommended Evaluation for Nests
• Comparing performance metrics is not enough:– Performance metrics show mixed response.– Possible for better model to have poorer metrics
• Diagnostic analysis is needed to compare nested grid to coarse grid model.
Center for Environmental Research and Technology/Air Quality Modeling
University of California at Riverside
Example Diagnostic Analysis
• Some sites had worse metrics for 12km.
• Analysis by Jim Boylan comparing differences in 12 km and 36 km results showed major effects from:
– Regional precipitation
– Regional transport (wind speed & direction)
– Plume definition
Center for Environmental Research and Technology/Air Quality Modeling
University of California at Riverside
Sulfate Change (36 km – 12 km)
Center for Environmental Research and Technology/Air Quality Modeling
University of California at Riverside
Wet Sulfate on July 9 at 01:0036 km Grid 12 km Grid
Center for Environmental Research and Technology/Air Quality Modeling
University of California at Riverside
Regional Transport (Wind Speed)
Center for Environmental Research and Technology/Air Quality Modeling
University of California at Riverside
Sulfate on July 9 at 05:0036 km Grid 12 km Grid
Center for Environmental Research and Technology/Air Quality Modeling
University of California at Riverside
Sulfate on July 9 at 06:0036 km Grid 12 km Grid
Center for Environmental Research and Technology/Air Quality Modeling
University of California at Riverside
Sulfate on July 9 at 07:0036 km Grid 12 km Grid
Center for Environmental Research and Technology/Air Quality Modeling
University of California at Riverside
Sulfate on July 9 at 08:0036 km Grid 12 km Grid
Center for Environmental Research and Technology/Air Quality Modeling
University of California at Riverside
Plume Definition and Artificial Diffusion
Center for Environmental Research and Technology/Air Quality Modeling
University of California at Riverside
Sulfate on July 10 at 00:0036 km Grid 12 km Grid
Center for Environmental Research and Technology/Air Quality Modeling
University of California at Riverside
Sulfate on July 10 at 06:0036 km Grid 12 km Grid
Center for Environmental Research and Technology/Air Quality Modeling
University of California at Riverside
Sulfate on July 10 at 09:0036 km Grid 12 km Grid
Center for Environmental Research and Technology/Air Quality Modeling
University of California at Riverside
Sulfate on July 10 at 12:0036 km Grid 12 km Grid
Center for Environmental Research and Technology/Air Quality Modeling
University of California at Riverside
Sulfate on July 10 at 16:0036 km Grid 12 km Grid
Center for Environmental Research and Technology/Air Quality Modeling
University of California at Riverside
Sulfate on July 10 at 21:0036 km Grid 12 km Grid
Center for Environmental Research and Technology/Air Quality Modeling
University of California at Riverside
Sulfate on July 11 at 00:0036 km Grid 12 km Grid
Center for Environmental Research and Technology/Air Quality Modeling
University of California at Riverside
Sulfate Change (36 km – 12 km)
Center for Environmental Research and Technology/Air Quality Modeling
University of California at Riverside
Nested Grid Recommendations
• Diagnostic evaluation is needed to judge nested grid performance.
• Coarse grid might have compensating errors that produce better performance metrics.
• Diagnostic evaluation is resource extensive.
• Should we just assume that higher resolution implies better physics?
Center for Environmental Research and Technology/Air Quality Modeling
University of California at Riverside
Conclusions – Key Issues
• Air quality models should include a model evaluation module that produces performance plots and metrics.
• Recommend bias factor as best metric for haze.• Much more work needed to address error
relative to spatial patterns.• If different models have similar error use the
model with the best science (even if more computationally expensive).
Center for Environmental Research and Technology/Air Quality Modeling
University of California at Riverside
Additional Work on Evaluation Tools
• Need to adapt evaluation software for PAMS and PM Supersites.
• Develop GUI to facilitate viewing of plots, include an open source tools for spatial animations.
• Develop software to produce more useful plots, e.g., contour plots of bias and error.