Outline• Introduction• Atmospheric Models• Ocean Models• Concluding Remarks
Validation of Coupled ModelsRichard M. Hodur
Naval Research LaboratoryMonterey, CA 93943-5502
Short Course on Significance Testing, Model Evaluation, and Alternatives
11 January 2004Seattle, WA
•Dr. James D. Doyle (NRL MRY)•Dr. Timothy F. Hogan (NRL MRY)•Dr. Xiaodong Hong (NRL MRY)•Dr. John C. Kindle (NRL SSC)•Dr. Paul May (CSC/NRL MRY)•Dr. Jason E. Nachamkin (NRL MRY)•Dr. Randy Pauley (FNMOC)•Dr. Ruth H. Preller (NRL SSC)•Dr. Julie D. Pullen (NRL MRY)•Dr. Robert C. Rhodes (NRL SSC)•Dr. Douglas L. Westphal (NRL MRY)
Validation of Coupled ModelsAcknowledgements
•Validation is more than a skill score•Validation implies a learning process
• Develop system• Measure skill of system• Seek ways to improve skill of system
•Validation is a critical component of model development
•Without validation there can be no improvement in model performance
Validation of Coupled ModelsContext of Talk
How Do We Measure the Validity and Usefulness of Atmosphere and Ocean Models?
Combination of Many Measures
Scientific Basis Record of Publications, Presentations, Patents, . . . Equations, Grid Structure, Numerical Techniques, Representation of Physical
Processes Based on Well-Tested, Peer-Reviewed Principles Reproduction of Analytic/Idealized Test Cases
Validate Numerical Schemes (e.g., Topographic Flow, PGF Computation) Validate Physical Parameterizations (e.g., Wangara, Convection)
Measure Real-Time Predictive Performance No one simple “metric” is available Objective Measurements are useful (e.g., RMS, Bias, Anomaly Correlation,
Tropical Cyclone Forecast Position Error, Precipitation Scores) May be Difficult to Measure on Mesoscale; Perform Subjective Evaluation of
Differing Episodic Events (e.g., Patterns, Trends, Drifters, Transport, Tracers) Measure Skill Over Long Time Periods (Months or more) Transitions require measure of skill of new version of system relative to a
benchmark version Measure Utility to User(s)
Does Output Meets User Needs? User-Feedback Robustness Efficiency (e.g., Wall time, Flops, . . . )
Validation of Coupled ModelsOutline
•Atmospheric Models•Global
• Anomaly Correlation• RMS, Bias Errors• Tropical Cyclone Track• Scorecard
•Mesoscale• Idealized Flow Studies• RMS, Bias• Qualitative Verification (Case Studies)• Event-Based Verification
•Ocean Models•Global
• Features/Positions• Sea Surface Height Validation• Anomaly Correlation• Sea Surface Temperature Validation
•Mesoscale• Transport• Sea Level Height• T/S Profiles• Coastal Issues• Coupling Issues
Validation of Coupled ModelsOutline
•Atmospheric Models•Global
• Anomaly Correlation• RMS, Bias Errors• Tropical Cyclone Track• Scorecard
•Mesoscale• Idealized Flow Studies• RMS, Bias• Qualitative Verification (Case Studies)• Event-Based Verification
•Ocean Models•Global
• Features/Positions• Sea Surface Height Validation• Anomaly Correlation• Sea Surface Temperature Validation
•Mesoscale• Transport• Sea Level Height• T/S Profiles• Coastal Issues• Coupling Issues
NOGAPS Annual Mean Forecast Statistics
Anomaly Correlation* of 500 mb HeightsValues Greater than 0.6 are Considered Skillful
24hr 48hr 72hr 96hr 120hr
An
om
aly
C
orr
ela
tion
0.6
0.7
0.8
0.9
1.0
Key:f: forecastc:
climatologya: analysis
*Anomaly Correlation:
2)(2)(
))((
cacf
cacf
NOGAPS:Navy Operational Global Atmospheric Prediction
System
24hr 48hr 72hr 96hr 120hr0.6
0.7
0.8
0.9
1.0
An
om
aly
C
orr
ela
tion
SouthernHemisphere1995-2003
NorthernHemisphere1988-2003
Results indicate that forecast skill is improving at the rate of about one day per decade
NH
Colo
rsS
H C
olo
rs
RMS and Bias ErrorsNOGAPS: Navy Operational Global Atmospheric Prediction System
Errors can be stratified by latitude bands, hemisphere, or
specified geographic area
SpeedError
Vector Error
Wind: CONUS
Largest wind errors are typically found at jet-level where the winds are the
strongest
Temperature: EuropeMeanError
RMSError
Color Key
Time series of the annual mean 48 hour tropical cyclone track error (nautical miles). In 2002 NOGAPS was the best performing global model for tropical cyclone prediction (JTWC).
The improvements in skill were largely due to the transition of improvements to the cumulus convection scheme and the increase in resolution.
NOGAPS Operational 48 h TC Track Forecast ErrorAll Basins
308 304 264 210 286 325 550Number of Forecasts
DoD 48 hour goal is 100 nautical miles
Fo
reca
st E
rro
r (n
mi)
NOGAPS:Navy Operational
Global Atmospheric Prediction System
NOGAPS Scoring SystemUsed for Comparing Benchmark Run with New Version
Each element of the scorecard is measured for a period of a minimum
of three weeks and is required to meet a statistical significance test of
at least 95%. Each element is scored as a win, loss, or tie based on the 95% significance level and a
minimum of 5% difference in the case of RMS errors. A net score of -1 (or higher) out of a possible +/-15
will be considered a neutral (or better) overall result.
Implementation will not result if there is significant degradation in
the TC tracks, even if the rest of the scorecard is positive.
Level Variable M etric Region T au Weight 500 mb Z AC NH 96 3 500 mb Z AC SH 96 11000 mb Z AC NH 96 11000 mb Z AC SH 96 1 850 mb U/V RM S T rop 72 1* 250 mb U/V RM S T rop 72 1
Forecast vs. Self-analysis
Level Variable M etric Region T au Weight 500 mb Z RM S Global 96 1 50 mb Z RM S Global 96 1 850 mb T RM S Global 96 1 250 mb T RM S Global 96 1 850 mb U/V RM S Global 96 1 250 mb U/V RM S Global 96 1
Forecast vs. Radiosonde (400 high quality stations)
* Weighted double if no TC track verification available
NOGAPS Scorecard
Validation of Coupled ModelsOutline
•Atmospheric Models•Global
• Anomaly Correlation• RMS, Bias Errors• Tropical Cyclone Track• Scorecard
•Mesoscale• Idealized Flow Studies• RMS, Bias• Qualitative Verification (Case Studies)• Event-Based Verification
•Ocean Models•Global
• Features/Positions• Sea Surface Height Validation• Anomaly Correlation• Sea Surface Temperature Validation
•Mesoscale• Transport• Sea Level Height• T/S Profiles• Coastal Issues• Coupling Issues
Idealized Flow StudiesCOAMPS: Coupled Ocean/Atmosphere Mesoscale Prediction System
WRF: Weather Research and Forecast Model
Mountain Wave Test CaseLinear Hydrostatic Gravity Waves
T=250K, U=20 m s-1, hm=1 m, a=10 kmdx=2 km, dz=250m, 121L
Na/U=5 Nhm/U=1x10-4
4
2
0
-2
-4
W (
m/s
) x
10-3
at
8h
COAMPS WRF (EM)Linear Analytic
Solution
2m
2 2
h ah(x)
x a
COAMPS RMS, Bias VerificationEurope: 27 km Grid
COAMPS 24-h Surface RMS
Errors
10 m RH (%)
RM
S E
rro
r (º
C o
r m
s-1)
Year (November)
10 m T (ºC) and 2 m U (m s-1)
COAMPS Wind RMS and Bias
Errors
Subjective Evaluation of COAMPS (27 km) and NOGAPS (T159) Forecasts of Mistral
27 Hour Forecasts of 10 meter Wind Valid at 0300 UTC 22 Aug 1998
Wind Speed (Kts)5 20 40
Full 27 km COAMPS Grid
0
Win
d S
pee
d (
Kts
)
20
40
COAMPS
NOGAPS
Display Area
SSM/I Wind Speeds0454 UTC 22 Aug 1998
Improvement of Aerosol Prediction Capability
Validation of NAAPS Using SeaWiFS and AERONET Data
AERONET Observation Network of sun photometers
0.4
0.1
0.1
0.1
0.3
NAAPS: Vertical Integral of ExtinctionSeaWiFS Image: 30 October 2001
Event-Based VerificationWhy Verify Events?
•The user wants a deterministic answer
•The model produces a deterministic forecast
•Unfortunately, the outcome is not deterministic!
•Verification should communicate the nature of the variability
Event-Based VerificationComposite Verification Method
•Identify events of interest in the forecasts• Rainfall greater than 25 mm
• Wind greater than 10 m/s
• Event contains between 50 and 500 grid points
•Define a kernel and collect coordinated samples• Square box located at center of event
• 31x31 grid points (837x837 km for 27 km grid)
•Compare forecast PDF to observed PDF
•Repeat process for observed events
Forecast event Observations
Collection kernel
x
Event center
Event-Based VerificationCollecting the Samples
RMS: shadeBias: contour
FCST: shadeOBS: contour
• 66-hour wind speed forecasts for 2000-01 over the Mediterranean Sea• Speed greater than 12 m/s, dir 270-70 deg., covering 50-500 grid points• Verified against SSM/I satellite observations
Event-Based VerificationMistral Speed Statistics
• 24-hour precipitation forecasts for April-September 2003 over full CONUS• Rain events greater than 25 mm covering 50-500 grid points• Verified against River Forecast Center precipitation analysis
Average rain (mm) given an event was predicted
FCST: shadeOBS: contour
FCST: shadeOBS: contour
Average rain (mm) given an event was observed
Event-Based VerificationCONUS Warm Season Precipitation
Validation of Coupled ModelsOutline
•Atmospheric Models•Global
• Anomaly Correlation• RMS, Bias Errors• Tropical Cyclone Track• Scorecard
•Mesoscale• Idealized Flow Studies• RMS, Bias• Qualitative Verification (Case Studies)• Event-Based Verification
•Ocean Models•Global
• Features/Positions• Sea Surface Height Validation• Anomaly Correlation• Sea Surface Temperature Validation
•Mesoscale• Transport• Sea Level Height• T/S Profiles• Coastal Issues• Coupling Issues
Validation of Gulf Stream Position in the Navy Layered Ocean Model (NLOM)
Validation of the Gulf Stream Position
Free-running Global NCOM
Ed
dy
Kin
eti
c E
ne
rgy
cm
2 /s
2
Global NCOM using data assimilation
In comparison to the free-running case, EKE at 700
m in the assimilative case
is generally higher and in closer agreement to
historical observations,
showing the two regions of
relatively high EKE south of Nova
Scotia and Newfoundland.
Validation of Eddy Kinetic Energy (EKE) in 1/8-degree global NCOM
Mean EKE at 700 m depth during 1998-2000
Climatological eddy kinetic energy near 700m depth in the
western North Atlantic. Taken from Schmitz (1996) which adapted the data from
Owens (1984,1991) and Richardson
(1993).
NCOM:Navy Coastal Ocean
Model
Validation of the Navy Layered Ocean Model (NLOM)
Anomaly Correlation42 30-day forecasts from Dec 20 2000 to Oct 24 2001
Blue Line: Persistence, Red Line: NLOM
SS
HS
ST
Global Kuroshio
Global Kuroshio
Gulf Stream
Gulf Stream
NOGAPS/POP Air-Ocean CouplingAir-Ocean with Data Assimilation/Forecast Cycle
OMVOI: Ocean Multivariate Optimum Interpolation AnalysisPOP: Parallel Ocean Program Prediction Model
NAVDAS NOGAPS
OMVOI POP
ATMOSPHERE
OCEAN
SST Predictions from POP Model using NOGAPS forcing verify
better than persistence forecasts
Reduced errors demonstrate importance of model to data assimilation
Analysis-only produces significant errors in coastal boundary currents
Validation of Coupled ModelsOutline
•Atmospheric Models•Global
• Anomaly Correlation• RMS, Bias Errors• Tropical Cyclone Track• Scorecard
•Mesoscale• Idealized Flow Studies• RMS, Bias• Qualitative Verification (Case Studies)• Event-Based Verification
•Ocean Models•Global
• Features/Positions• Sea Surface Height Validation• Anomaly Correlation• Sea Surface Temperature Validation
•Mesoscale• Transport• Sea Level Height• T/S Profiles• Coastal Issues• Coupling Issues
Validation of the Intra-Americas Sea Nowcast/Forecast System (IASNFS)
Run Daily to 72 h (http://www7320.nrlssc.navy.mil/IASNFS_WWW/)
•MODAS: Modular Ocean Data Assimilation System• 2D Optimum Interpolation Analysis• Synthetic T/S profiles generated, used as observations• All observations assimilated during 12-hour pre-forecast period
•Domain/Bathymetry:
•NCOM: Navy Coastal Ocean Model• 1/24-degree grid spacing• 40 vertical levels (20 sigma/20 z)• NOGAPS Forcing
25.9/28.0
26.0/28.0
25.2/28.8
28.2/31.7
3.2/2.9
-0.8/1.0
4.2/7.0
1.6/2.6
4.8/2.5
1.9/3.1
1.1/1.1
3.0/1.6
2.1/1.5
3.2/2.9
3.2/5.7
Validation of the Intra-Americas Sea Nowcast/Forecast System (IASNFS)NCOM Predicted Transport (2001 Mean) vs. Observations
Key:IASNFS/Observation
Validation of the IASNFS Predictions of Sea Level Height
Comparison to Tide Gauges
Validation of the IASNFS Predictions of Sea Level Height
Comparison to Persistence
Validation of the IASNFS Temperature and Salinity (T/S) Profiles
Comparisons to (non-assimilated) CTD data
X XXX
Red Line: CTD ObservationBlue Line: IASNFS
Air-Ocean CouplingCoastal Issues
Validation of Wind Stress for 9 km Nest in EPAC
Black Line: Stress calculated from observationsBlue Line: Stress from operational COAMPS interpolated to lat/lon gridRed Line: Stress from COAMPS reanalysis on native grid
Results indicate that unfiltered, native grid fields are required for proper forcing and validation along
coasts
Figure courtesy of John Kindle, NRL SSC
81 km
27 km
9 km
Eastern Pacific
•Atmosphere:•Bora: Strong, localized
northeasterly winds around Istrian peninsula
•Scirocco: Strong, warm southeast winds
•Ocean:• Cyclonic cells in the central
and southern regions
• River runoff and strong winds create large variability in the northern Adriatic
Bora
Po River
Ocean-Atmosphere Nested Modeling of the Adriatic Sea during Winter and Spring 2001
Meteorology and Oceanography in the Adriatic
Collaboration with Adriatic Circulation Experiment (ACE)
1. Generate 27 km atmospheric forcing fields over the Med2. Generate 6 km, 2-year spin-up of the Med using forcing from
#1, then 12-hour data assimilation for October 19993. Generate 4 km atmospheric forcing fields over the Adriatic Sea4. Generate 2 km Adriatic forecasts using initial conditions and
inflow from #2, and atmospheric forcing from #3
Objectives• Simulate Adriatic
atmospheric and oceanic circulation at high resolution
• Document and understand response of the shallow northern Adriatic waters to forcing by the Bora and Po river run-off
• Quantify the effects of coupling (e.g., one-way, two-way, frequency, resolution) on atmosphere and ocean forecasts
• Aid in planning and interpreting Adriatic Circulation Experiment (ACE) observations
6 km NCOM
27 km
81 km COAMPSTM
2 km NCOM
36 km
12 kmCOAMPS
TM
13
2
4
4 km
Momentum, Heat fluxes
Initial conditions and lateral boundary forcing
Mo
men
tum
, H
eat fluxes
4 km 36 km
Ocean-Atmosphere Nested Modeling of the Adriatic Sea during Winter and Spring 2001
COAMPS Wind Stress (Mean and RMS vector amplitude)28 January - 4 June 2001
Ocean-Atmosphere Nested Modeling of the Adriatic Sea during Winter and Spring 2001
COAMPS/NCOM Model Circulation: EOFsNCOM: 2 km Grid Spacing
COAMPS Wind Stress Curl Mode 1
NCOM 5 m Velocity Mode 1
NCOM 25 m Velocity Mode 1
36 km
4 km
36 k
m
Fo
rcin
g4
km
Fo
rcin
g
Comparison of observed 10 m winds to observations (top) and 25 m ocean current to observations (bottom)
Comparison using 36 km (blue) and 4 km (red) atmospheric forcing
Results(1) 4 km and 36
km winds have similar correlation to observations
(2) Ocean model performs better with 4 km winds
Atmosphere
Ocean
(1)
(2)
Results suggest that the consideration of the effects on an ocean model should be a metric in the validation of atmospheric models and that high-resolution
forcing fields improve ocean forecasts
Importance of Temporal Resolution of Ocean Forcing
Comparison of NCOM runs using 1 h, 6 h, and 12 h COAMPS™ forcing
Preliminary results suggest that significant
differences exist when forcing an
ocean model with 12 h
frequency as opposed to 1 h
or 6 h frequency
12 h frequency runs
1 h and 6 h frequency runs
Real-Time COAMPS Support for AOSN II
• Twice Daily Forecasts to 72 h with Data Assimilation• NOGAPS Lateral Boundary Conditions• SGI Origin 3900 at FNMOC DoD HPC DC Facility• Real-Time Winds and Fluxes Used to Force Multiple
Ocean Models
9 km
3 km
27 km
81 km
Quadruple-nest grid built for AOSN area
27 km 9 km 3 km
Sample 10 m wind speeds from inner 3
meshes
12
6
0
Win
d S
peed
(m
/s)
AOSN II: Adaptive Ocean Sampling Network II
•Many Tools Available:• RMS, Bias• Anomaly Correlation• Idealized Tests• Threat Scores• Event-Based Validation• Qualitative/Quantitative Case Studies
•Long-Term Studies are Mandatory:• Avoid Simplistic Answers with Single “Case Study”• Minimum Requirements for Evaluation of Systems:
• 2-week Periods for Summer and Winter• Test Over Several Different Geographical Areas
•Simple Questions/Complex Answers to Validation:• Grid Structures• Formulation of Dynamics• Physical Parameterizations/Interactions• Data Assimilation Issues (i.e., QC, Analysis Techniques, Initialization, First-Guess)• Sensitivity in specific grid-point validation• “Represent-ativeness” of what is being validated (i.e., Resolution)• Bugs (Validation requires understanding of code, no “black box” mentality)
Concluding RemarksAtmospheric Model Validation
•Many Tools Available:• RMS, Bias• Anomaly Correlation• Qualitative/Quantitative Case Studies• Idealized Test Cases
•Validation/Performance Affected by Atmospheric Model:• Resolution (Spatial and Temporal)• Grid: Native vs. Interpolated/Filtered
•Long-Term Studies are Mandatory•Unique Validation Parameters:
• Transport• Sea Surface Height• Tides
•Simple Questions/Complex Answers to Validation as in the Atmospheric Models
Concluding RemarksOcean/Coupled Model Validation
Concluding RemarksChallenges
•Demonstrating improved skill is becoming more difficult to do
• Models have improved tremendously• Modeling systems are much more complex• Requires thorough understanding of model(s), no “black box” mentality
•More validation metrics are needed, especially for mesoscale modeling
• Higher resolution does not always translate to improved skill scores• Phase/Pattern shifting validation?
•Expect dramatic increase in remotely-sensed data - How to apply to validation of models?
•Coupled modeling complicates the validation process:• Air/Ocean interactions/feedbacks• What if atmosphere forecasts are better (worse) and ocean forecasts are worse (better)?
• Additional resources needed•Commitment of more resources to validation (Also commit
more resources to preparing efficient code)
•Important, Do Right:• Listen to the Customer• Data Assimilation• Configuration Management• Lower Boundary Condition• Physical Parameterizations• Validation/Verification• Efficiency• Be Creative, Build Flexibility in System
•Important, Don’t Do Wrong:• Numerics• Grid Configuration/Flexibility/Relocatability• Upper, Lateral Boundary Conditions• Horizontal Diffusion• Database Issues:
• Portability• Resolution (terrain, coastlines, etc.)
• Plug-compatible code• Use Standard “Sane” FORTRAN, UNIX
Concluding RemarksLessons Learned from Model Validation/Development
Outline• Introduction• Atmospheric Models• Ocean Models• Concluding Remarks
Validation of Coupled ModelsRichard M. Hodur
Naval Research LaboratoryMonterey, CA 93943-5502
Short Course on Significance Testing, Model Evaluation, and Alternatives
11 January 2004Seattle, WA