quantifying the skill of cloud forecasts from the ground and from space
DESCRIPTION
Quantifying the skill of cloud forecasts from the ground and from space. Robin Hogan Julien Delanoe , Ewan O’Connor, Anthony Illingworth, Jonathan Wilkinson University of Reading, UK. Other areas of interest. Representing effects of cloud structure in radiation schemes - PowerPoint PPT PresentationTRANSCRIPT
Robin HoganRobin HoganJulien Delanoe, Ewan O’Connor, Anthony Illingworth, Julien Delanoe, Ewan O’Connor, Anthony Illingworth,
Jonathan WilkinsonJonathan Wilkinson
University of Reading, UKUniversity of Reading, UK
Quantifying the skill of Quantifying the skill of cloud forecastscloud forecasts
from the ground and from spacefrom the ground and from space
Other areas of interestRepresenting effects of cloud structure in radiation schemes– Horizontal inhomogeneity, overlap, 3D effects
Mixed-phase clouds– Why are they so
poorly represented in models?
Convection– Estimating
microphysical properties and fluxes of mass and momentum from observations
Overview• The “Cloudnet” processing of ground-based radar and lidar
observations– Continuous evaluation of the climatology of clouds in models
• Testing the skill of cloud forecasts from seven models– Desirable properties of skill scores; good and bad scores– Skill versus cloud fraction, height, scale, forecast lead time– Estimating the forecast “half life”
• Cloud fraction evaluation using a spaceborne lidar simulator– Evaluation of ECMWF model with ICESat/GLAS lidar
• Synergistic retrievals of ice cloud properties from the A-train– Variational methodology– Testing of the Met Office and ECMWF models
Project
• Original aim: to retrieve and evaluate the crucial cloud variables in forecast and climate models– Seven models: 5 NWP and 2 regional climate models in NWP mode– Variables: cloud fraction, LWC, IWC, plus a number of others– Four sites across Europe: UK, Netherlands, France, Germany– Period: Several years, to avoid unrepresentative case studies
• Ongoing/future work (dependent on sources of funding)– Apply to ARM data worldwide– Run in near-real-time for rapid feedback to NWP centers– Evaluate multiple runs of single-column versions of models
Level 1b
• Minimum instrument requirements at each site– Cloud radar, lidar, microwave radiometer, rain gauge, model or sondes
Radar
Lidar
Level 1c
Ice
LiquidRain
Aerosol
• Instrument Synergy product– Example of target classification and data quality fields:
Level 2a/2b
• Cloud products on (L2a) observational and (L2b) model grid– Water content and cloud fraction
L2a IWC on radar/lidar grid
L2b Cloud fraction on model grid
ChilboltonObservations
Met OfficeMesoscale
Model
ECMWFGlobal Model
Meteo-FranceARPEGE Model
KNMIRACMO Model
Swedish RCA model
Cloud fraction
NCEP over
SGP in 2007
• Hot off the press!• Produced directly
from ARM’s “ARSCL” product so could easily be automated
• NCEP model appears to under-predict low and mid-level cloud
How skillful is a forecast?
• Most model comparisons evaluate the cloud climatology– What about individual forecasts?
• Standard measure shows ECMWF forecast “half-life” of ~6 days in 1980 and ~9 in 2000 – But virtually insensitive to clouds!
ECMWF 500-hPa geopotential anomaly correlation
Joint PDFs of cloud fraction
• Raw (1 hr) resolution– 1 year from Murgtal– DWD COSMO model
• 6-hr averaging
ab
cd
…or use a simple contingency table
5 desirable properties of skill scores
1. “Equitable”: all random forecasts score zero– This is essential!– Note that forecasting the right climatology versus height but with
no other skill should also score zero
2. “Proper”: not possible to “hedge your bets”– Some scores reward under- or over-prediction (e.g. hit rate)– Jolliffe and Stephenson: impossible to be equitable and strictly
proper!
3. Independent of how often cloud occurs– Almost all scores asymptote to 0 or 1 for vanishingly rare events
4. Dependence on full joint PDF, not just 2x2 contingency table– Difference between cloud fraction of 0.9 and 1 is as important for
radiation as a difference between 0 and 0.1
5. “Linear”: so that can fit an inverse exponential– Some scores (e.g. Yule’s Q) “saturate” at the high-skill end
Possible skill scoresContingency
tableObserved
cloud Observed clear
sky
Modeled cloud
ahit
b false alarm
Modeled clear sky
cmiss
d correct negative
“Cloud” deemed to occur when cloud fraction f is larger than some threshold
fthresh
• To ensure equitability and linearity, we can use the concept of the “generalized skill score” = (x-xrandom)/(xperfect-xrandom)– Where “x ” is any number derived from the joint PDF– Resulting scores vary linearly from random=0 to perfect=1
Possible skill scoresContingency
tableObserved
cloud Observed clear
sky
Modeled cloud
ahit
b false alarm
Modeled clear sky
cmiss
d correct negative
DWD model
a = 7194 b = 4098
c = 4502 d = 41062
Perfect forecast
ap = 11696 bp = 0
cp = 0 dp = 45160
Random forecast
ar = 2581 br = 8711
cr = 9115 dr = 36449
• To ensure equitability and linearity, we can use the concept of the “generalized skill score” = (x-xrandom)/(xperfect-xrandom)– Where “x ” is any number derived from the joint PDF– Resulting scores vary linearly from random=0 to perfect=1
• Simplest example: Heidke skill score (HSS) uses x=a+d– We will use this as a reference to test other scores
• Brier skill score uses x=mean squared cloud-fraction difference, Linear Brier skill score (LBSS) uses x=mean absolute difference– Sensitive to errors in model for all values of cloud fraction
“Cloud” deemed to occur when cloud fraction f is larger than
some threshold fthresh
Some simpler scores• Hit rate or Prob. of Detection: H=a/(a+c)
– “Fraction of cloudy events correctly forecast”
– E.g. Mace et al. (1998) for cloud occurrence
• Problems– Not equitable– Easy to “hedge”: forecast cloud all the time
guarantees a perfect score, so favours models that overpredict cloud
– This is linked to its asymmetry
• Log of Odds Ratio: LOR=ln(ad/bc)– E.g. Stephenson (2008) for tornado
forecasts
• Properties– Equitable– Not easy to hedge– Unbounded: a perfect score is infinity!
LOR
H
Skill versus cloud-fraction threshold
• Consider 7 models evaluated over 3 European sites in 2003-2004
LOR implies skill increases for larger
cloud-fraction thresholdHSS implies skill decreases
significantly for larger cloud-fraction threshold
LORHSS
Extreme dependency score• Stephenson et al. (2008) explained this behavior:
– Almost all scores have a meaningless limit as “base rate” p 0– HSS tends to zero and LOR tends to infinity
• They proposed the Extreme dependency score:
– where n = a + b + c + d
• It can be shown that this score tends to a meaningful limit:– Rewrite in terms of hit rate H =a/(a +c) and base rate p =(a +c)/n :
– Then assume a power-law dependence of H on p as p 0:– In the limit p 0 we find
– This is meaningful because random forecasts have Hit rate converging to zero at the same rate as base rate: =1 so EDS=0
– Perfect forecasts have constant Hit rate with base rate: =0 so EDS=1
Symmetric extreme dependency score
• Problems with EDS:– Easy to hedge by predicting cloud all the time so c
=0– Not equitable
• These are solved by defining a symmetric version:
– All the benefits of EDS, none of the drawbacks!
Hogan, O’Connor and Illingworth (2009, submitted to QJRMS)
Skill versus cloud-fraction threshold
SEDS has much flatter behaviour for all models (except for Met Office which underestimates high cloud occurrence significantly)
LORHSS SEDS
Skill versus height– Most scores not reliable
near the tropopause because cloud fraction tends to zero
LORHSS
LBSS
SEDS
• New score reveals:– Skill tends to slowly
decrease at tropopause
– Mid-level clouds (4-5 km) most skilfully predicted, particularly by Met Office
– Boundary-layer clouds least skilfully predicted
A surprise?• Is mid-level cloud well forecast???
– Frequency of occurrence of these clouds is commonly too low (e.g. from Cloudnet: Illingworth et al. 2007)
– Specification of cloud phase cited as a problem– Higher skill could be because large-scale ascent has largest
amplitude here, so cloud response to large-scale dynamics most clear at mid levels
– Higher skill for Met Office models (global and mesoscale) because they have the arguably most sophisticated microphysics, with separate liquid and ice water content (Wilson and Ballard 1999)?
• Low skill for boundary-layer cloud is not a surprise!– Well known problem for forecasting (Martin et al. 2000) – Occurrence and height a subtle function of subsidence rate,
stability, free-troposphere humidity, surface fluxes, entrainment rate...
Skill versus lead time
• Only possible for UK Met Office 12-km model and German DWD 7-km model– Steady decrease of skill with lead time– Both models appear to improve between 2004 and 2007
• Generally, UK model best over UK, German best over Germany– An exception is Murgtal in 2007 (UK model wins)
2004 2007
Forecast “half life”
• Fit an inverse-exponential:– S1 is the score after 1 day and 1/2 is the half-life
• Noticeably longer half-life fitted after 36 hours– Same thing found for Met Office rainfall forecast (Roberts 2008)– First timescale due to data assimilation and convective events– Second due to more predictable large-scale weather systems
2004 20072.6 days
2.9 days2.9 days2.7 days2.9 days
2.7 days
2.7 days3.1 days
2.4 days
4.0 days4.3 days4.3 days
3.0 d
3.2 d
3.1 d
Met Office DWD
Why is half-life less for clouds than pressure?
• Different spatial scales? Convection?– Average temporally before calculating skill scores:
– Absolute score and half-life increase with number of hours averaged
• Cloud is noisier than geopotential height Z because it is separated by around two orders of differentiation:
– Cloud ~ vertical wind ~ relative vorticity ~ 2streamfunction ~ 2pressure– Suggests cloud observations should be used routinely to evaluate models
Geopotential height anomaly Vertical velocity
Alternative approach• How valid is it to estimate 3D cloud fraction from 2D slice?
– Henderson and Pincus (2009) imply that it is reasonable, although presumably not in convective conditions
• Alternative: treat cloud fraction as a probability forecast– Each time the model forecasts a particular cloud fraction, calculate
the fraction of time that cloud was observed instantaneously over the site
– Leads to a Reliability Diagram:
Jakob et al. (2004)
Perfect
No skillNo resolution
Satellite observations: IceSAT• Cloud observations from IceSAT 0.5-micron
lidar (first data Feb 2004)• Global coverage but lidar attenuated by thick
clouds: direct model comparison difficult
Optically thick liquid cloud obscures view of any clouds beneath
Solution: forward-model the measurements (including attenuation) using the ECMWF variables
Lidar apparent backscatter coefficient (m-1 sr-1)
Latitude
Simulate lidar backscatter:– Create subcolumns with max-rand
overlap– Forward-model lidar backscatter from
ECMWF water content & particle size– Remove signals below lidar sensitivity
ECMWF raw cloud fraction
ECMWF cloud fraction after processing
IceSAT cloud fraction
Global cloud fraction comparison
ECMWF raw cloud fraction ECMWF processed cloud fraction
IceSAT cloud fraction
Wilkinson, Hogan, Illingworth and Benedetti (MWR 2008)
• Results for October 2003– Tropical convection peaks too
high– Too much polar cloud– Elsewhere agreement is good
• Results can be ambiguous– An apparent low cloud
underestimate could be a real error, or could be due to high cloud above being too thick
Testing the model climatology
Reduction in model due to lidar attenuation
Error due to uncertain extinction-to-backscatter ratio
Testing the model skill from space
Clearly need to apply SEDS to cloud estimated from lidar & radar!
Unreliable region
Lowest skill: tropical boundary-layer clouds
Tropical skill appears to peak at mid-levels but cloud very infrequent
here
Highest skill in north mid-latitude and polar upper
troposphere
Is some of reduction of skill at low levels because of lidar
attenuation?
Ice cloud retrievals from the A-train
• Advantages of combining radar, lidar and radiometers– Radar ZD6, lidar ’D2 so the
combination provides particle size– Radiances ensure that the retrieved
profiles can be used for radiative transfer studies
• How do we do we combine them optimally?– Use a “variational” framework: takes
full account of observational errors– Straightforward to add extra
constraints and extra instruments– Allows seamless retrieval between
regions of different instrument sensitivity
• Retrievals will be compared to Met Office and ECMWF forecasts under the A-train
Formulation of variational scheme
m
m
m
n
I
I
Z
Z
0.127.8
7.8
1
1
ln
ln
y
aer1
liq1
1
ice
ice1
ice1
ln
ln
LWP
ln
ln
ln
ln
N
S
N
N
m
n
x
For each ray of data we define:• Observation vector • State vector
– Elements may be missing– Logarithms prevent unphysical negative values
Attenuated lidar backscatter profile
Radar reflectivity factor profile (on different grid)
Ice visible extinction coefficient profile
Ice normalized number conc. profile
Extinction/backscatter ratio for ice
(TBD) Aerosol visible extinction coefficient profile
(TBD) Liquid water path and number conc. for each liquid layer
Visible optical depth
Infrared radiance
Radiance difference
Solution method• An iterative method is required
to minimize the cost function
New ray of dataLocate cloud with radar & lidarDefine elements of xFirst guess of x
Forward modelPredict measurements y from state vector x using forward model H(x)Predict the Jacobian H=yi/xj
Has solution converged?2 convergence test
Gauss-Newton iteration stepPredict new state vector:
xk+1= xk+A-1{HTR-1[y-H(xk)]
-B-1(xk-b)-Txk}
where the Hessian isA=HTR-1H+B-1+T
Calculate error in retrieval
No
Yes
Proceed to next ray
CloudSat-CALIPSO-MODIS example
1000 km
Lidar observations
Radar observations
Radar forward model
CloudSat-CALIPSO-MODIS example
Lidar observations
Lidar forward model
Radar observations
Radar forward model
• Extinction coefficient
• Ice water content
• Effective radius
Forward modelMODIS 10.8-m
observations
Radar-lidar retrieval
Radiances matched by increasing extinction near cloud top
…add infrared radiances
Forward modelMODIS 10.8-m
observations
Radar-lidar complementarity
CloudSat radar
CALIPSO lidar
MODIS 11 micron channel
Time since start of orbit (s)
Heig
ht
(km
)H
eig
ht
(km
)
Cirrus detected only by lidar
Mid-level liquid
clouds
Deep convection penetrated only by radar
Retrieved extinction (m-1)
Comparison with ECMWFlog10(IWC[kg m-3])
A-Train
Tem
pera
ture
(°C
)Comparison with model IWC
Met Office ECMWF
• Global forecast model data extracted underneath A-Train• A-Train ice water content averaged to model grid
– Met Office model lacks observed variability– ECMWF model has artificial threshold for snow at around 10-4 kg m-3
Tem
pera
ture
(°C
)
Summary and outlook• Defined five key properties of a good skill score
– Plenty of bad scores are used (hit rate, false-alarm rate etc)– New “Symmetric extreme dependency score” is equitable and
nearly independent of the occurrence of the quantity being forecast
• Model comparisons reveal– Half-life of a cloud forecast is between 2.5 and 4 days, much less
than ~9 days for ECMWF 500-hPa geopotential height forecast– Longer-timescale predictability after 1.5 days– Higher skill for mid-level cloud and lower for boundary-layer cloud– Proposal submitted to apply some of these metrics (including
probabilistic ones) to NWP & single-column models over the ARM sites
• Further work with radar-lidar-radiometer retrieval– Being used to test new ice cloud scheme in ECMWF model, as well
as high-resolution simulations of tropical convection in “Cascade” project
– Retrieve liquid clouds and precipitation at the same time to provide a truly seamless retrieval from the thinnest to the thickest clouds
– Adapt for EarthCARE satellite (ESA/JAXA: launch 2013)
Cloud fraction in 7 models• Mean & PDF for 2004 for Chilbolton, Paris and Cabauw
Illingworth et al. (BAMS 2007)
0-7 km
– Uncertain above 7 km as must remove undetectable clouds in model
– All models except DWD underestimate mid-level cloud– Some have separate “radiatively inactive” snow (ECMWF, DWD); Met
Office has combined ice and snow but still underestimates cloud fraction
– Wide range of low cloud amounts in models– Not enough overcast boxes, particularly in Met Office model
Model cloud
Model clear-sky
A: Cloud hit B: False alarm
C: Miss D: Clear-sky hit
Observed cloud Observed clear-sky
Comparison with Met Officemodel over ChilboltonOctober 2003
Contingency tables
Monthly skill versus time• Measure of the skill of forecasting cloud fraction>0.05
– Comparing models using similar forecast lead time– Compared with the persistence forecast (yesterday’s
measurements)
• Lower skill in summer convective events
Why N0*/0.6?• In-situ aircraft data show
that N0*/0.6 has
temperature dependence
that is independent of
IWC
• Therefore we have a
good a-priori estimate to
constrain the retrieval
• Also assume vertical
correlation to spread
information in height,
particularly to parts of
the profile detected by
only one instrument
Why N0*???• We need to be able to
forward model Z and other variables from x
• Large scatter between extinction and Z implies 2D lookup-table is required
• When normalized by N0*, there is a near-unique relationship between /N0* and Z/N0* (as well as re, IWC/N0* etc.)
Ice cloud: non-variational retrieval
• Donovan et al. (2000) algorithm can only be applied where both lidar and radar have signal
Observations
State variables
Derived variables
Retrieval is accurate but not perfectly stable where lidar loses signal
Aircraft-simulated
profiles with noise (from
Hogan et al. 2006)
Variational radar/lidar retrieval
• Noise in lidar backscatter feeds through to retrieved extinction
Observations
State variables
Derived variables
Lidar noise matched by retrieval
Noise feeds through to other variables
…add smoothness constraint
• Smoothness constraint: add a term to cost function to penalize curvature in the solution (J’ = id2i/dz2)
Observations
State variables
Derived variables
Retrieval reverts to a-priori N0
Extinction and IWC too low in radar-only region
…add a-priori error correlation
• Use B (the a priori error covariance matrix) to smooth the N0 information in the vertical
Observations
State variables
Derived variables
Vertical correlation of error in N0
Extinction and IWC now more accurate
Effective radius versus temperature
All clouds
An effective radius parameterization?
Comparison of mean effective radius
• July 2006 mean value of re=3IWP/2i from CloudSat-CALIPSO only
• Just the top 500 m of cloud
• MODIS/Aqua standard product
Comparison of ice water pathMean of all skies
Mean of clouds
CloudSat-CALIPSO MODIS
• Need longer period than just one month (July 2006) to obtain adequate statistics from poorer sampling of radar and lidar
Comparison of optical depthMean of all skies
Mean of clouds
CloudSat-CALIPSO MODIS
• Mean optical depth from CloudSat-CALIPSO is lower than MODIS simply because CALIPSO detected many more optically thin clouds not seen by MODIS
• Hence need to compare PDFs as well