quantifying the skill of cloud forecasts from the ground and from space

Robin HoganRobin HoganJulien Delanoe, Ewan O’Connor, Anthony Illingworth, Julien Delanoe, Ewan O’Connor, Anthony Illingworth,

Jonathan WilkinsonJonathan Wilkinson

University of Reading, UKUniversity of Reading, UK

Quantifying the skill of Quantifying the skill of cloud forecastscloud forecasts

from the ground and from spacefrom the ground and from space

Other areas of interestRepresenting effects of cloud structure in radiation schemes– Horizontal inhomogeneity, overlap, 3D effects

Mixed-phase clouds– Why are they so

poorly represented in models?

Convection– Estimating

microphysical properties and fluxes of mass and momentum from observations

Overview• The “Cloudnet” processing of ground-based radar and lidar

observations– Continuous evaluation of the climatology of clouds in models

• Testing the skill of cloud forecasts from seven models– Desirable properties of skill scores; good and bad scores– Skill versus cloud fraction, height, scale, forecast lead time– Estimating the forecast “half life”

• Cloud fraction evaluation using a spaceborne lidar simulator– Evaluation of ECMWF model with ICESat/GLAS lidar

• Synergistic retrievals of ice cloud properties from the A-train– Variational methodology– Testing of the Met Office and ECMWF models

Project

• Original aim: to retrieve and evaluate the crucial cloud variables in forecast and climate models– Seven models: 5 NWP and 2 regional climate models in NWP mode– Variables: cloud fraction, LWC, IWC, plus a number of others– Four sites across Europe: UK, Netherlands, France, Germany– Period: Several years, to avoid unrepresentative case studies

• Ongoing/future work (dependent on sources of funding)– Apply to ARM data worldwide– Run in near-real-time for rapid feedback to NWP centers– Evaluate multiple runs of single-column versions of models

Level 1b

• Minimum instrument requirements at each site– Cloud radar, lidar, microwave radiometer, rain gauge, model or sondes

Radar

Lidar

Level 1c

Ice

LiquidRain

Aerosol

• Instrument Synergy product– Example of target classification and data quality fields:

Level 2a/2b

• Cloud products on (L2a) observational and (L2b) model grid– Water content and cloud fraction

L2a IWC on radar/lidar grid

L2b Cloud fraction on model grid

ChilboltonObservations

Met OfficeMesoscale

Model

ECMWFGlobal Model

Meteo-FranceARPEGE Model

KNMIRACMO Model

Swedish RCA model

Cloud fraction

NCEP over

SGP in 2007

• Hot off the press!• Produced directly

from ARM’s “ARSCL” product so could easily be automated

• NCEP model appears to under-predict low and mid-level cloud

How skillful is a forecast?

• Most model comparisons evaluate the cloud climatology– What about individual forecasts?

• Standard measure shows ECMWF forecast “half-life” of ~6 days in 1980 and ~9 in 2000 – But virtually insensitive to clouds!

ECMWF 500-hPa geopotential anomaly correlation

Joint PDFs of cloud fraction

• Raw (1 hr) resolution– 1 year from Murgtal– DWD COSMO model

• 6-hr averaging

ab

cd

…or use a simple contingency table

5 desirable properties of skill scores

1. “Equitable”: all random forecasts score zero– This is essential!– Note that forecasting the right climatology versus height but with

no other skill should also score zero

2. “Proper”: not possible to “hedge your bets”– Some scores reward under- or over-prediction (e.g. hit rate)– Jolliffe and Stephenson: impossible to be equitable and strictly

proper!

3. Independent of how often cloud occurs– Almost all scores asymptote to 0 or 1 for vanishingly rare events

4. Dependence on full joint PDF, not just 2x2 contingency table– Difference between cloud fraction of 0.9 and 1 is as important for

radiation as a difference between 0 and 0.1

5. “Linear”: so that can fit an inverse exponential– Some scores (e.g. Yule’s Q) “saturate” at the high-skill end

Possible skill scoresContingency

tableObserved

cloud Observed clear

sky

Modeled cloud

ahit

b false alarm

Modeled clear sky

cmiss

d correct negative

“Cloud” deemed to occur when cloud fraction f is larger than some threshold

fthresh

• To ensure equitability and linearity, we can use the concept of the “generalized skill score” = (x-xrandom)/(xperfect-xrandom)– Where “x ” is any number derived from the joint PDF– Resulting scores vary linearly from random=0 to perfect=1

Possible skill scoresContingency

tableObserved

cloud Observed clear

sky

Modeled cloud

ahit

b false alarm

Modeled clear sky

cmiss

d correct negative

DWD model

a = 7194 b = 4098

c = 4502 d = 41062

Perfect forecast

ap = 11696 bp = 0

cp = 0 dp = 45160

Random forecast

ar = 2581 br = 8711

cr = 9115 dr = 36449

• To ensure equitability and linearity, we can use the concept of the “generalized skill score” = (x-xrandom)/(xperfect-xrandom)– Where “x ” is any number derived from the joint PDF– Resulting scores vary linearly from random=0 to perfect=1

• Simplest example: Heidke skill score (HSS) uses x=a+d– We will use this as a reference to test other scores

• Brier skill score uses x=mean squared cloud-fraction difference, Linear Brier skill score (LBSS) uses x=mean absolute difference– Sensitive to errors in model for all values of cloud fraction

“Cloud” deemed to occur when cloud fraction f is larger than

some threshold fthresh

Some simpler scores• Hit rate or Prob. of Detection: H=a/(a+c)

– “Fraction of cloudy events correctly forecast”

– E.g. Mace et al. (1998) for cloud occurrence

• Problems– Not equitable– Easy to “hedge”: forecast cloud all the time

guarantees a perfect score, so favours models that overpredict cloud

– This is linked to its asymmetry

• Log of Odds Ratio: LOR=ln(ad/bc)– E.g. Stephenson (2008) for tornado

forecasts

• Properties– Equitable– Not easy to hedge– Unbounded: a perfect score is infinity!

LOR

H

Skill versus cloud-fraction threshold

• Consider 7 models evaluated over 3 European sites in 2003-2004

LOR implies skill increases for larger

cloud-fraction thresholdHSS implies skill decreases

significantly for larger cloud-fraction threshold

LORHSS

Extreme dependency score• Stephenson et al. (2008) explained this behavior:

– Almost all scores have a meaningless limit as “base rate” p 0– HSS tends to zero and LOR tends to infinity

• They proposed the Extreme dependency score:

– where n = a + b + c + d

• It can be shown that this score tends to a meaningful limit:– Rewrite in terms of hit rate H =a/(a +c) and base rate p =(a +c)/n :

– Then assume a power-law dependence of H on p as p 0:– In the limit p 0 we find

– This is meaningful because random forecasts have Hit rate converging to zero at the same rate as base rate: =1 so EDS=0

– Perfect forecasts have constant Hit rate with base rate: =0 so EDS=1

Symmetric extreme dependency score

• Problems with EDS:– Easy to hedge by predicting cloud all the time so c

=0– Not equitable

• These are solved by defining a symmetric version:

– All the benefits of EDS, none of the drawbacks!

Hogan, O’Connor and Illingworth (2009, submitted to QJRMS)

Skill versus cloud-fraction threshold

SEDS has much flatter behaviour for all models (except for Met Office which underestimates high cloud occurrence significantly)

LORHSS SEDS

Skill versus height– Most scores not reliable

near the tropopause because cloud fraction tends to zero

LORHSS

LBSS

SEDS

• New score reveals:– Skill tends to slowly

decrease at tropopause

– Mid-level clouds (4-5 km) most skilfully predicted, particularly by Met Office

– Boundary-layer clouds least skilfully predicted

A surprise?• Is mid-level cloud well forecast???

– Frequency of occurrence of these clouds is commonly too low (e.g. from Cloudnet: Illingworth et al. 2007)

– Specification of cloud phase cited as a problem– Higher skill could be because large-scale ascent has largest

amplitude here, so cloud response to large-scale dynamics most clear at mid levels

– Higher skill for Met Office models (global and mesoscale) because they have the arguably most sophisticated microphysics, with separate liquid and ice water content (Wilson and Ballard 1999)?

• Low skill for boundary-layer cloud is not a surprise!– Well known problem for forecasting (Martin et al. 2000) – Occurrence and height a subtle function of subsidence rate,

stability, free-troposphere humidity, surface fluxes, entrainment rate...

Skill versus lead time

• Only possible for UK Met Office 12-km model and German DWD 7-km model– Steady decrease of skill with lead time– Both models appear to improve between 2004 and 2007

• Generally, UK model best over UK, German best over Germany– An exception is Murgtal in 2007 (UK model wins)

2004 2007

Forecast “half life”

• Fit an inverse-exponential:– S1 is the score after 1 day and 1/2 is the half-life

• Noticeably longer half-life fitted after 36 hours– Same thing found for Met Office rainfall forecast (Roberts 2008)– First timescale due to data assimilation and convective events– Second due to more predictable large-scale weather systems

2004 20072.6 days

2.9 days2.9 days2.7 days2.9 days

2.7 days

2.7 days3.1 days

2.4 days

4.0 days4.3 days4.3 days

3.0 d

3.2 d

3.1 d

Met Office DWD

Why is half-life less for clouds than pressure?

• Different spatial scales? Convection?– Average temporally before calculating skill scores:

– Absolute score and half-life increase with number of hours averaged

• Cloud is noisier than geopotential height Z because it is separated by around two orders of differentiation:

– Cloud ~ vertical wind ~ relative vorticity ~ 2streamfunction ~ 2pressure– Suggests cloud observations should be used routinely to evaluate models

Geopotential height anomaly Vertical velocity

Alternative approach• How valid is it to estimate 3D cloud fraction from 2D slice?

– Henderson and Pincus (2009) imply that it is reasonable, although presumably not in convective conditions

• Alternative: treat cloud fraction as a probability forecast– Each time the model forecasts a particular cloud fraction, calculate

the fraction of time that cloud was observed instantaneously over the site

– Leads to a Reliability Diagram:

Jakob et al. (2004)

Perfect

No skillNo resolution

Satellite observations: IceSAT• Cloud observations from IceSAT 0.5-micron

lidar (first data Feb 2004)• Global coverage but lidar attenuated by thick

clouds: direct model comparison difficult

Optically thick liquid cloud obscures view of any clouds beneath

Solution: forward-model the measurements (including attenuation) using the ECMWF variables

Lidar apparent backscatter coefficient (m-1 sr-1)

Latitude

Simulate lidar backscatter:– Create subcolumns with max-rand

overlap– Forward-model lidar backscatter from

ECMWF water content & particle size– Remove signals below lidar sensitivity

ECMWF raw cloud fraction

ECMWF cloud fraction after processing

IceSAT cloud fraction

Global cloud fraction comparison

ECMWF raw cloud fraction ECMWF processed cloud fraction

IceSAT cloud fraction

Wilkinson, Hogan, Illingworth and Benedetti (MWR 2008)

• Results for October 2003– Tropical convection peaks too

high– Too much polar cloud– Elsewhere agreement is good

• Results can be ambiguous– An apparent low cloud

underestimate could be a real error, or could be due to high cloud above being too thick

Testing the model climatology

Reduction in model due to lidar attenuation

Error due to uncertain extinction-to-backscatter ratio

Testing the model skill from space

Clearly need to apply SEDS to cloud estimated from lidar & radar!

Unreliable region

Lowest skill: tropical boundary-layer clouds

Tropical skill appears to peak at mid-levels but cloud very infrequent

here

Highest skill in north mid-latitude and polar upper

troposphere

Is some of reduction of skill at low levels because of lidar

attenuation?

Ice cloud retrievals from the A-train

• Advantages of combining radar, lidar and radiometers– Radar ZD6, lidar ’D2 so the

combination provides particle size– Radiances ensure that the retrieved

profiles can be used for radiative transfer studies

• How do we do we combine them optimally?– Use a “variational” framework: takes

full account of observational errors– Straightforward to add extra

constraints and extra instruments– Allows seamless retrieval between

regions of different instrument sensitivity

• Retrievals will be compared to Met Office and ECMWF forecasts under the A-train

Formulation of variational scheme

m

m

m

n

I

I

Z

Z

0.127.8

7.8

1

1

ln

ln

y

aer1

liq1

1

ice

ice1

ice1

ln

ln

LWP

ln

ln

ln

ln

N

S

N

N

m

n

x

For each ray of data we define:• Observation vector • State vector

– Elements may be missing– Logarithms prevent unphysical negative values

Attenuated lidar backscatter profile

Radar reflectivity factor profile (on different grid)

Ice visible extinction coefficient profile

Ice normalized number conc. profile

Extinction/backscatter ratio for ice

(TBD) Aerosol visible extinction coefficient profile

(TBD) Liquid water path and number conc. for each liquid layer

Visible optical depth

Infrared radiance

Radiance difference

Solution method• An iterative method is required

to minimize the cost function

New ray of dataLocate cloud with radar & lidarDefine elements of xFirst guess of x

Forward modelPredict measurements y from state vector x using forward model H(x)Predict the Jacobian H=yi/xj

Has solution converged?2 convergence test

Gauss-Newton iteration stepPredict new state vector:

xk+1= xk+A-1{HTR-1[y-H(xk)]

-B-1(xk-b)-Txk}

where the Hessian isA=HTR-1H+B-1+T

Calculate error in retrieval

No

Yes

Proceed to next ray

CloudSat-CALIPSO-MODIS example

1000 km

Lidar observations

Radar observations

Radar forward model

CloudSat-CALIPSO-MODIS example

Lidar observations

Lidar forward model

Radar observations

Radar forward model

• Extinction coefficient

• Ice water content

• Effective radius

Forward modelMODIS 10.8-m

observations

Radar-lidar retrieval

Radiances matched by increasing extinction near cloud top

…add infrared radiances

Forward modelMODIS 10.8-m

observations

Radar-lidar complementarity

CloudSat radar

CALIPSO lidar

MODIS 11 micron channel

Time since start of orbit (s)

Heig

ht

(km

)H

eig

ht

(km

)

Cirrus detected only by lidar

Mid-level liquid

clouds

Deep convection penetrated only by radar

Retrieved extinction (m-1)

Comparison with ECMWFlog10(IWC[kg m-3])

A-Train

Tem

pera

ture

(°C

)Comparison with model IWC

Met Office ECMWF

• Global forecast model data extracted underneath A-Train• A-Train ice water content averaged to model grid

– Met Office model lacks observed variability– ECMWF model has artificial threshold for snow at around 10-4 kg m-3

Tem

pera

ture

(°C

)

Summary and outlook• Defined five key properties of a good skill score

– Plenty of bad scores are used (hit rate, false-alarm rate etc)– New “Symmetric extreme dependency score” is equitable and

nearly independent of the occurrence of the quantity being forecast

• Model comparisons reveal– Half-life of a cloud forecast is between 2.5 and 4 days, much less

than ~9 days for ECMWF 500-hPa geopotential height forecast– Longer-timescale predictability after 1.5 days– Higher skill for mid-level cloud and lower for boundary-layer cloud– Proposal submitted to apply some of these metrics (including

probabilistic ones) to NWP & single-column models over the ARM sites

• Further work with radar-lidar-radiometer retrieval– Being used to test new ice cloud scheme in ECMWF model, as well

as high-resolution simulations of tropical convection in “Cascade” project

– Retrieve liquid clouds and precipitation at the same time to provide a truly seamless retrieval from the thinnest to the thickest clouds

– Adapt for EarthCARE satellite (ESA/JAXA: launch 2013)

Cloud fraction in 7 models• Mean & PDF for 2004 for Chilbolton, Paris and Cabauw

Illingworth et al. (BAMS 2007)

0-7 km

– Uncertain above 7 km as must remove undetectable clouds in model

– All models except DWD underestimate mid-level cloud– Some have separate “radiatively inactive” snow (ECMWF, DWD); Met

Office has combined ice and snow but still underestimates cloud fraction

– Wide range of low cloud amounts in models– Not enough overcast boxes, particularly in Met Office model

Model cloud

Model clear-sky

A: Cloud hit B: False alarm

C: Miss D: Clear-sky hit

Observed cloud Observed clear-sky

Comparison with Met Officemodel over ChilboltonOctober 2003

Contingency tables

Monthly skill versus time• Measure of the skill of forecasting cloud fraction>0.05

– Comparing models using similar forecast lead time– Compared with the persistence forecast (yesterday’s

measurements)

• Lower skill in summer convective events

Why N0*/0.6?• In-situ aircraft data show

that N0*/0.6 has

temperature dependence

that is independent of

IWC

• Therefore we have a

good a-priori estimate to

constrain the retrieval

• Also assume vertical

correlation to spread

information in height,

particularly to parts of

the profile detected by

only one instrument

Why N0*???• We need to be able to

forward model Z and other variables from x

• Large scatter between extinction and Z implies 2D lookup-table is required

• When normalized by N0*, there is a near-unique relationship between /N0* and Z/N0* (as well as re, IWC/N0* etc.)

Ice cloud: non-variational retrieval

• Donovan et al. (2000) algorithm can only be applied where both lidar and radar have signal

Observations

State variables

Derived variables

Retrieval is accurate but not perfectly stable where lidar loses signal

Aircraft-simulated

profiles with noise (from

Hogan et al. 2006)

Variational radar/lidar retrieval

• Noise in lidar backscatter feeds through to retrieved extinction

Observations

State variables

Derived variables

Lidar noise matched by retrieval

Noise feeds through to other variables

…add smoothness constraint

• Smoothness constraint: add a term to cost function to penalize curvature in the solution (J’ = id2i/dz2)

Observations

State variables

Derived variables

Retrieval reverts to a-priori N0

Extinction and IWC too low in radar-only region

…add a-priori error correlation

• Use B (the a priori error covariance matrix) to smooth the N0 information in the vertical

Observations

State variables

Derived variables

Vertical correlation of error in N0

Extinction and IWC now more accurate

Effective radius versus temperature

All clouds

An effective radius parameterization?

Comparison of mean effective radius

• July 2006 mean value of re=3IWP/2i from CloudSat-CALIPSO only

• Just the top 500 m of cloud

• MODIS/Aqua standard product

Comparison of ice water pathMean of all skies

Mean of clouds

CloudSat-CALIPSO MODIS

• Need longer period than just one month (July 2006) to obtain adequate statistics from poorer sampling of radar and lidar

Comparison of optical depthMean of all skies

Mean of clouds

CloudSat-CALIPSO MODIS

• Mean optical depth from CloudSat-CALIPSO is lower than MODIS simply because CALIPSO detected many more optically thin clouds not seen by MODIS

• Hence need to compare PDFs as well

quantifying the skill of cloud forecasts from the ground and from space

Documents

cloud climatologywhat

skill of cloud forecastsfrom

cloud fractionl2a iwc

crucial cloud variables

automatedncep model

model comparisons

ecmwf forecast halflife

forecast lead timesncep