clinicians' views of formats of performance comparisons

Clinicians’ views of formats of performance comparisonsjep_1777 1..8

Dominique Allwood MBBS BSc MSc MFPH,1 Zoe Hildon MA PhD2 and Nick Black MD FFPH3

1Honorary Research Fellow, 2Lecturer in Social Research, 3Professor of Health Services Research, Department of Health Services Research &Policy, London School of Hygiene & Tropical Medicine, London, UK

Keywords

clinicians views, compositional format, datapresentation, performance comparisons

Correspondence

Prof Nick BlackProfessor of Health Services ResearchDepartment of Health Services Research &PolicyLondon School of Hygiene & TropicalMedicine15-17 Tavistock PlaceLondon WC1H 9SHE-mail: [email protected]

Accepted for publication: 21 July 2011

doi:10.1111/j.1365-2753.2011.01777.x

AbstractRationale, aims and objectives Comparisons of the performance of health care providersare increasingly being used. Despite one key audience being clinicians, there has been littleresearch on the format and content of such comparisons. Our aim was to explore clinicians’comprehension and preferences of format and content in displaying provider outcomesusing comparisons of patient reported outcome measures data.Method A qualitative study, based on seven meetings involving 107 clinicians (mostlyconsultant and junior doctors, and nurses), revealed their views on nine formats and fiveaspects of content.Results Key findings were the desire for data in more than one format, explicit display ofcomparative performance (rank order) and the need for explanations (e.g. of unfamiliarformats and of statistical uncertainty).Conclusions Several themes were identified that shaped clinicians’ views. Results weresufficiently clear to permit recommendations for the form and content of standard reportsfor the National Health Service.

IntroductionData comparing the performance of health care providers areincreasingly being made available in many countries. One of thekey audiences for such information is clinicians in the hope thatsuch data will stimulate them to improve the quality of the carethey provide [1]. It is therefore important that comparative out-comes data are displayed in ways that will maximize their impacton clinicians [2]. Two features of presentations need to be consid-ered: compositional format and aspects of content. The formerrefers to the way data are housed and includes options such as barcharts, tables and funnel plots. Aspects of content include scaling,statistical uncertainty and framing.

Presentational displays of performance information for clini-cians are often being adopted in the absence of much researchevidence on the best format and content to use. A recent system-atic review looking at the impact of format and content of dis-playing quantitative data found only one study in whichclinicians were the main audience. This evaluative study sug-gested that regarding format, clinicians liked tables (withnumbers) finding them easier to understand than bar charts,which were more likely to lead to inaccurate decisions [3]. Iconswere found to be easiest to understand but not liked by clini-cians. Another study presented two formats of performance datato health service commissioners [4], some of whom had a clini-

cal background. Despite bar charts being the preferred format,they were found to be less well understood than scatter-plots.However, as the bar charts also displayed confidence intervals(CIs), there also appeared to be a lack of understanding of uncer-tainty and the role of CIs [4].

The review found 28 other studies in which the audiences weremembers of the public, students or patients [3]. None of the studiesin the review considered caterpillar or funnel plots. Many of thetopics studied often did not relate to health care. The five principalgeneric themes to emerge were: the more complex the data, theless easily it is understood [5]; interpretation of bar charts is lessaccurate than that of tables [4,6–8]; explanatory cues aid under-standing [6,9–11]; ordered data (e.g. ranking) are easier to under-stand than unordered [12,13]; and indicators of statistical certainty(e.g. CIs) are often not well understood [4,6,11,14–16].

The need for better understanding of clinicians’ views has takenon added importance in England with the introduction of patientreported outcome measures (PROMs) for assessing the quality ofhealth care. Initially, their mandatory use in the National HealthService (NHS) is confined to four elective surgical operations (hipand knee replacement, hernia repair and varicose vein surgery)[17]. Several aspects of patients’ health status before and aftersurgery are collected from questionnaires completed by patients.Comparisons of providers are adjusted for case-mix (age, sex,severity, co-morbidity, general health status).

Journal of Evaluation in Clinical Practice ISSN 1365-2753

© 2011 Blackwell Publishing Ltd, Journal of Evaluation in Clinical Practice 1

Our aim was to explore clinicians’ comprehension and prefer-ences as regards the format and content of comparisons based onPROMs data so as to inform the design of routine reports for theNHS. To understand clinicians’ views, a qualitative approach wasadopted based on group meetings in a representative range of NHShospitals.

MethodA pragmatic approach to the meetings as regards setting, durationand attendance was adopted given we had to fit in with clinicians’limited availability. An iterative approach was taken in whichpresentations were modified as the fieldwork progressed.

Participants

The six meetings held at hospitals were supplemented with one ata national conference for clinicians involved in pre-operativeassessment. The hospitals were selected from those that had par-ticipated in the Patient Outcomes in Surgery (POiS) audit in 2008–2009 as those sites had demonstrated an interest in the topic. Sevenmeetings were judged sufficient to reach saturation of findings.Of the 107 people who participated (Table 1), consultants werepresent at all meetings, junior doctors at four meetings and nursesor allied health professionals at five meetings.

Topics covered in meetings

Published literature informed the choice of formats and aspects ofcontent were considered (Table 2). Formats for single outcomes(such as the mean change in PROM score) included: numericaltables, bar charts, caterpillar plots, funnel plots and tables withicons. Formats for displaying multiple outcomes included:numerical tables (scorecards), multiple bar charts, radar plots andtwo-dimensional (2D) axes. Aspects of content included ways ofdisplaying statistical uncertainty, scaling, ordering of data,framing and the number of providers to include.

As there were many possible combinations of formats andcontent for each of the four operations, a manageable number hadto be created (and focused on one operation, hip surgery) for use inall meetings irrespective of the clinicians’ specialty. A graphicdesigner created the charts using real data that had been anony-mized. Fictitious hospital names were assigned to indicate thehospital where the meeting was held and the four nearest provid-ers. Different fictitious names were assigned on different charts toavoid participants’ views of one display being influenced by apreceding chart. Although there was not enough time to showevery chart in each meeting, each one was discussed on at least twooccasions.

Data collection and analysis

Meetings were audio-recorded, having obtained the consent ofparticipants. Each meeting was facilitated by one of the authors,accompanied by an observer who took notes. Meetings lastedabout 1 hour and were based on a PowerPoint presentation thatstarted with a brief introduction to PROMs and the project. The T

ab

le1

Des

crip

tion

ofpa

rtic

ipan

ts

Spe

cial

tyTy

peof

mee

ting

Num

ber

ofpa

rtic

ipan

tsC

onsu

ltant

sJu

nior

doct

ors

Nur

ses

and

AH

Ps

Oth

ers*

1O

rtho

paed

icsu

rger

yD

epar

tmen

talc

linic

algo

vern

ance

mee

ting

74

3–

–2

Pre

-ope

rativ

eas

sess

men

t/an

aest

hetic

sS

essi

onat

natio

nalc

onfe

renc

e17

5–

102

3G

ener

alsu

rger

yS

peci

ally

arra

nged

mee

ting

74

–1

24

Ort

hopa

edic

surg

ery

Dep

artm

enta

lclin

ical

gove

rnan

cem

eetin

g30

516

9–

5G

ener

alsu

rger

y/or

thop

aedi

csu

rger

y/an

aest

hetic

sS

peci

ally

arra

nged

mee

ting

62

–4

–6

Gen

eral

surg

ery/

care

ofth

eel

derly

Hos

pita

l-wid

ete

achi

ngm

eetin

g20

416

––

7O

rtho

paed

icsu

rger

yD

epar

tmen

talc

linic

algo

vern

ance

mee

ting

205

94

2To

tal

107

2944

286

*Man

ager

s,ad

min

istr

ator

s,IT

staf

f,cl

inic

alau

dit

staf

f.A

HP,

allie

dhe

alth

prof

essi

onal

.

Clinicians’ views of formats of performance comparisons D. Allwood et al.

© 2011 Blackwell Publishing Ltd2

facilitator sought the participants’ views, their understanding andtheir preferences for different options based on the issues listed inTable 2.

The recordings were transcribed verbatim, and transcripts wereindependently analysed by all authors. The views of each formatand for each aspect of content were collected from across the sevenmeetings and summarized (with illustrative quotes) to reflect therange of views and provide an indication of the extent of consensusor divergence of views. Subsequently, thematic analysis wasundertaken to identify the key underlying concepts and percep-tions that informed clinicians’ views. In both the descriptive andthe thematic analysis, there was a high level of agreement betweenauthors; where differences occurred, a consensus was achievedthrough discussion.

Results

Formats for single outcomes

Tables (with numbers)

Many clinicians favoured numerical tables as they provided infor-mation that was perceived as neutral and allowed manipulation:

I think if the table is just to present the data, you want thisdata for the few people who do actually want to look at it and

use it and do things with it. So you just need to then havedata there without bias in it, without other interpretation in it.

In contrast, others thought tables were cluttered and lackedvisual clarity:

You’re dealing with differences of 2.4 and it’s just not enoughto see – and the confidence interval’s the same. It’s looking atthe numbers and it’s very difficult to see and get your headaround it.

Numerical tables were also seen as difficult to use for compar-ing providers and comparisons were inevitably limited to only afew providers:

My reservation on the tables . . . is they don’t easily give youan idea of the actual spread across the whole lot of providers. . . It doesn’t give you an idea of what the differencebetween top and bottom is and . . . the distribution aroundthat average.

Bar charts

Bar charts were widely liked as they were familiar, clear andfacilitated comparisons:

As long as the institution we work for is able to be picked outfrom that, which is usually where the colour comes in or alabel, that’s absolutely perfect . . . It gives you much morethan the table, in the sense that you can tell whether you’retop ten or top quartile or top whatever, can’t you?

However, when CIs were included, it was apparent that therewas a lack of understanding of how they should be interpreted anda need was expressed for an explanation to be provided:

On this chart you would have an explanation about whyapparently some of them below the first one, which is worsethan average, are not [labelled] worse than average and you’dhave to have some explanation.

Caterpillar plots

Some clinicians found caterpillar plots clearer than bar charts withCIs. This facilitated rapid comparisons of providers which couldbe supplemented by referring to numerical tables:

The caterpillar would be the right one. At least, if you arebelow par and you want to see how far you are below or howfar you are above the rest . . . Just as a quick shot, you knowwhere you are, where you’re standing compared to your otherhospitals.

Funnel plots

Funnel plots generated mixed views. Support was based on theease with which data could be appraised, the inclusion of data onthe number of procedures carried out by a provider (which facili-tated comparisons of providers with similar volumes) and theeffective way funnel plots could be used to monitor secular trendsof their own performance:

I like it because absolutely immediately you can see . . . whatthe mean is and you can also quite easily get an idea of thevolume of each provider.

Table 2 Formats and aspects of content addressed in meetings

Format

How should data be presented?Table with numbersBar chart (with and without confidence intervals)Caterpillar plotFunnel plotTable with icons

How should multiple outcomes be combined?Table (scorecard)Grouped bar chartSpider (radar) plot2D axes

Content

How should uncertainty be displayed?Bar chart with confidence intervals vs bar chart without vs

caterpillar plotHow should results be framed?

Good news or bad newsWhat should be the extent of scales?

Full scale vs restricted scaleIn what order should providers be shown?

Absolute scoreStatistical certaintyAlphabeticalHorizontal vs vertical bar chart

How many providers should be shown?Local (5)Regional (20)National (all)

D. Allwood et al. Clinicians’ views of formats of performance comparisons

© 2011 Blackwell Publishing Ltd 3

The other thing about this is that if you plot yourself onthis on successive years, you can see whether you’re movingin a direction. So you can see whether you’re driftingupwards or downwards against the mean, which is, I think, areally, really useful thing to be able to do as a clinicianlooking at your data over the years.

In contrast, those not familiar with the format were put off bytheir apparent complexity:

I find this a difficult format. Too many things in there. Theother [formats] are much easier to figure out.

Tables (with icons)

Clinicians had few favourable things to say about tables with icons(specifically, with stars). However, a few thought their visualclarity made them easy to appraise at a glance. There were severalreasons why most clinicians were critical. First, it was felt theyprovided insufficient information. Second, although the assign-ment of stars was based on the statistical certainty of whether ornot a provider’s performance differed from the mean for England,many clinicians assumed that the method of assigning stars was anarbitrary or even random, lacking transparency and neutrality:

Somebody else is making their mind up for you about howthey’re presenting your data and you’ve got no numbersbehind it. It suggests that [hospital A with four stars] is 30%better than anybody else [with three stars], which the datadoesn’t say.

A third concern was that star ratings gave no indication of howclose to or far from a ‘boundary’ between categories a providerlay:

Giving most of them three stars and somebody, sort of fairlyrandomly, just a little bit more get four stars. That’s big . . . invisual terms. That’s a huge jump, which isn’t represented bythe data.

Overall, the use of stars was seen as too popularist. Their usewas likened to assessments used in newspapers for holidays andrestaurants:

I think stars, awful. I think you can thank Mr Blair for thosebeing bloody terrible . . . Hopefully most people in this room don’tread the Daily Mail or The Sun and for that reason alone I think itis ludicrously simplistic.

Formats for multiple outcomes

Most clinicians felt it was beneficial to show several outcomemeasures together. Numerical tables (scorecards) were favoured asthey provided more data than other formats, in a neutral form andin considerable depth:

I personally would just like that table and I’m not botheredabout all the rest of the diagrams . . . If you’re really drillingdown into this, you need the absolute figures and I think it’shelpful to have your colour coding for above and belowaverage.

However, scorecards were seen as having a few disadvantages.Some clinicians noted problems with using them to compare them-

selves with other providers as rank order was not clear. Anothercriticism, that they were not accessible, prompted the suggestionthat each metric could be ‘scored’ and the scores combined to givean overall rank of providers, although it was acknowledged thiswould negate their ‘neutrality’:

If you were doing it like Which? magazine, you’d have anumber of scores that were stars out of five and then you’dhave an overall score . . . So you’ve got four lots of stars outof five and then a total score, which is your Which? Best Buy.

Grouped bar charts were familiar and enabled rapid appraisal ofthe spread of data:

I think a lot of people would be used to seeing these type ofdata, so a lot of clinicians . . . would understand this sort ofinformation and there is a comparison also there and four dif-ferent types, so it’s more understandable.

However, a few people remarked that grouping several variablesin the bar chart impeded cognitive processing because so muchwas going on it lacked visual clarity.

Spider plots were familiar to and understood by cliniciansinvolved with the National Joint Register. And of those who werenot familiar, some quickly appreciated the benefits. However,others remained critical, suggesting spider plots lacked clarity,were inevitably limited to comparisons of only a few providers andthe scaling/plotting can lack sensitivity in showing differencesbetween providers, as well as it being difficult to indicate statisticalsignificance:

Four or five . . . if it is more than that, there’s no way you aregoing to see where it is . . . and there are so many parametersyou are trying to study.

The 2D axes were the least favoured format partly because itwas the least known, often causing confusion on initial viewingand seen to lack sensitivity in showing differences between pro-viders:

It just looks the same and whether there are real differencesor not, there aren’t actually very much differences betweenthose two graphs . . . they virtually look the same.

Some clinicians were critical of all multiple outcome formatsthat they felt lacked visual clarity and gave each outcome the sameweight:

I don’t think it’s useful to have it. I think it’s confusing. AndI think it’s much clearer using the same format and just doingit with each parameter.

I guess the issue is that what’s being proposed is that this(disease-specific PROM) has as much weighting [as post-operative problems]. I think that in some simple to read report[a disease-specific PROM] is much more important . . . That’sthe problem, this seems to be just as important and actuallywe’re all saying it probably isn’t.

Another concern was that the multiple outcome formats mightdiscourage closer inspection of the component outcomes:

You have the problem that people will tend to go for the onechart that would give them the most information withoutlooking at all the others. So if you put four bits of informa-



tion on it this will become the main thing that’s looked at bythe clinician because they’ll skip all the rest and say this hasgot all the information we need.

Content

Uncertainty

Uncertainty was displayed in several ways: CIs, highlighting sta-tistical differences by different colours, and for one format, usingicons. Most clinicians agreed that data should only be shown withCIs as they wanted to know if their institution was a statisticallysignificant outlier so they could take corrective action:

It’s true though, that when you’re an outlier. . . all you reallywant to do is get back in the pack and be average again . . .You can aspire to being a green bar (better than average), andthat’s what you should be doing. But just to get back in thepack is good enough.

However, some clinicians questioned the level of understandingamong staff:

I’m not totally convinced that orthopaedic surgeons wouldnecessarily know what a confidence interval was. Of course Ithink most people this day and age should understand it, Iwould hope.

If you’re not used to dealing with them you don’t reallyunderstand what that actually means to the data. The conceptthat one number can be higher than another but actuallyworse is very difficult to explain.

Despite general support for including CIs, this was not a uni-versal view. Some challenged both the need for them and theirimportance relative to the clinical significance of any observeddifferences:

You’ve chosen a 95% confidence interval which is quite anarbitrary thing and yet it’s taken as gospel all over the place. Iwould rather go to [hospital A] than [hospital B] even thoughat 95% confidence levels there’s no statistical differencebecause it’s still more likely than not, so not a criminal level,that [hospital A] is better than [hospital B].

A lot of the things that come back to us, like Dr Fosterreports and deaths and complications tables and things likethat, are questionable in a similar way. If you’re 4.2% and thenational average is 4.3% are you really better or worse?

Some clinicians wanted explanations of how to interpret statis-tical uncertainty, particularly on bar charts, or clear colour codingor labels (but not the addition of icons). Of those who wanted CIsto be included, some felt that this would be best achieved innumerical tables:

I’ve spent time sitting with graphs trying to work out what thebloody confidence intervals are and reading them off to adecimal point . . . If you really want to know what a confi-dence interval is, it’s probably best that you have figuresrather than just a bar on a chart.

Tables with icons were disliked because they did not explicitlyshow CIs. (In fact, the stars had been categorized by statistical

certainty – e.g. four stars was 2 to 3 standard deviations better thanaverage – but this had not been understood).

Scaling

While clinicians agreed on the need for consistency in the direc-tion of axes, there was less agreement on the range of scaling.People recognized that the range could influence the impact that achart might have: full scales may not expose good or poor perfor-mance while restricted scales might magnify ‘real differences’.Many clinicians knew how scaling could emphasize differencesbut didn’t feel this need be detrimental as it could increase theaccuracy of understanding or prompt action:

It depends on how you put the scale because, obviously, ifyou want to emphasise the difference . . . you just reduce thescale and make the difference bigger. If you want to see thatno one is too bad, because they are almost all average, youjust put 100 and everybody will be just next to them. So itreally depends on how you want to influence the audience.

Ordering

Tables (with numbers or with icons) were shown in three ways:ordered by alphabetical name of provider, absolute rank of theoutcome and statistical rank (where the highest value was notnecessarily at the top). Some preferred ordering by alphabeticalname as it both avoided creating a league table and made it easierfor clinicians to find their hospital. Others, however, stated that itmade it difficult to compare providers:

In order of outcome is preferable to alphabetical orderbecause at the end of the day, it is sort of a league table and Ithink I would feel more comfortable having a league table.

For those that disliked alphabetical ordering, there were mixedviews about whether to rank by score or by statistical significance:

If you rank people by performance then you’d have an issueabout the confidence intervals because you may have some-body at the top who’s not necessarily the best because they’vegot very wide confidence intervals

Framing

Clinicians were shown some outcomes ‘framed’ as ‘bad news’(e.g. proportion of patients reporting a complication) or as ‘goodnews’ (proportion reporting no complications). While most clini-cians preferred ‘good news’ as it focused on patients getting better,others felt it was more important to use familiar metrics and avoiddouble negatives:

I think if you are giving ‘no complications’ it probably makes,generally, people feel better because you’re always told howbad we are doing all the time. Probably that will feel at leastsomebody’s getting better.

[Its] an issue of what you’re used to . . . and we’re used, asclinicians, we think of complication rates.

Number of providers

Many clinicians were primarily interested in information on localproviders (plus national benchmarks), partly out of competivenessand partly to identify sources of help and advice:



You’re most interested in how you do with your immediateneighbours.

We all want to know how the local hospitals are doing.I think the value for this would be to focus on another hos-

pital who is doing a lot better than you and perhaps go andask them questions of how they are doing better [and] learnfrom the excellent end. And if they’re local, then that’s easierto do than if they’re not.

In contrast, others argued that local data were insufficient andthey wanted to know how they compared nationally:

You only present five of them and on that you stand fourth.It’s different if you look over all England. You are probably104th. So it doesn’t work, basically.

The absolute local ones has more implication for thepatients who’ve got a choice of going to local hospitals,which is going to become an issue. But from our point ofview, is not quite so much of an issue.

Some preferred the option of regional data, maybe supple-mented with some indication of the national range, although theyrecognized the difficulty of defining regions. There was interest innational data for several reasons. First, it allowed clinicians to seethe full range of performances, including the best and worstoverall, and to look at more distant providers with which they werefamiliar:

We need to see the higher, whole national curves. What’s thepoint in comparing yourself to four local hospitals, if all thefour hospitals are rubbish?

Two other options were suggested by clinicians. One was toshow only those providers that were better or worse than averageor to limit comparisons to similar types of providers.

Underlying themes

Several inter-related themes appeared to shape clinicians’ views.Although some themes mostly concerned either the format or thecontent, most related to both or the interplay between them. Thesethemes help to explain clinicians’ understanding and preferences,which were already described.

More than one format

Tensions inevitably existed, given clinicians’ conflicting require-ments: wanting visual clarity but also numerical detail; simpleformats easily understood but also a recognition of the need forstatistical uncertainty to be made apparent; a local focus andnational context. These were resolved by a reluctance to selectonly one format, instead recognizing the need for data to be pre-sented in more than one way.

Familiarity with a format

Clinicians’ familiarity with a format had a positive impact on theirpreferences and understanding, sometimes overruling other con-cerns about a format’s shortcomings. However, those unfamiliarwith a format were fairly flexible and ready to accept the unfamil-iar as long as clear explanations were provided.

Capacity of a format to be tailored to

an individual’s needs

The capacity of a format to provide scope for a clinician to inter-rogate the data and analyse them of different ways was valued bymany clinicians. In this regard, numerical tables benefited overfunnel plots and, for multiple outcomes, scorecards with numberswere preferred to spider plots.

‘Less is more’: aids cognitive processing

There was widespread desire for visual clarity in which uncom-plicated formatting communicated more by virtue of its simplic-ity (e.g. bar charts and caterpillar plots trumped numericaltables). The greater speed and ease with which clinicians couldprocess information and knowledge presented in differentformats, the stronger their preference (e.g. caterpillar plotsenabled fast processing of provider comparisons whereas score-cards of multiple outcomes did not). However, taken to extremes(e.g. tables with icons), this led to a loss of credibility andrejection.

Uncertainty needs to be appropriately displayed

and explained

Clinicians recognize the need to avoid misinterpretation of com-parisons. Given the lack of universal understanding of CIs, alter-native ways of displaying (e.g. colour highlighting) and explaininguncertainty are needed. This would provide transparency as towhether or not their provider was a statistically significant outlier,to inform taking corrective action. One consequence was a rejec-tion of formats that did not encompass uncertainty (e.g. spiderplots).

Emphasis on good news

Preference for framing data positively, either in terms of endorsinggood news (e.g. proportion of patients improved) or using knownscaling ‘tricks’ to focus on similarity between providers (e.g. useof full rather than restricted scales).

Consistency

Importance of maintaining consistency: axes showing ‘betterhealth’ or greater improvement at top; colours for indicating sta-tistical differences from overall mean (e.g. green for better, red forworse).

Desire to make comparisons

Clinicians endorse the notion of comparing providers (and wouldalso wish to compare the performance of individual clinicians) andwelcomed the opportunity of identifying and learning from betterperformers. To that end, comparisons with more local providerswere favoured as it would be easier to learn from those in prox-imity. However, comparisons across the country help put localperformance in context of national range.



Accuracy rather than preference

Where there was a trade-off between their preference and correctunderstanding, clinicians recognized and favoured the greaterimportance of the latter as regards both format and content.

Discussion

Main findings

Despite some diversity of views, a clear pattern emerged toinform the provision of routine comparisons of provider out-comes for clinicians. Generally they preferred formats withwhich they were familiar, although recognized the advantagesof novel formats (such as funnel plots) and felt that once theyhad familiarized themselves, they would welcome the benefitsthat a new format offered. There was a desire to have data inmore than one format. In general, an initial simple format waswanted followed by the option of accessing greater detail andcomplexity.

Each format was seen as having both advantages and disadvan-tages. Tables with numbers provided the most information, made iteasy to see the exact scores and CIs, and allowed clinicians to usethe data for their own purposes. However, for others, the level ofdetail made this format difficult to comprehend and did not facili-tate comparisons of providers.

Bar charts were familiar and made comparisons of providerseasy. However, it was apparent that despite their apparent simplic-ity, many clinicians misinterpreted the significance of differencesbetween providers partly because of the difficulty of including CIswithout reducing the readability of the charts. The latter was notsuch a problem with caterpillar plots which, for this reason, werepreferred by many clinicians.

Most clinicians were not familiar with funnel plots and, under-standably, had difficulty comprehending them until they had beenexplained. As familiarity grew, they saw several advantages: theinclusion of volume of patients allowed comparisons with similarsized providers; the simple, accessible way that statistical certaintywas displayed facilitated the identification of outliers; and thepotential over time to display secular trends.

While some clinicians liked the simplicity of tables with icons,most were strongly opposed to their use as they provided too littleinformation and gave no indication of how close a provider was toa ‘boundary’ between two categories. However, this partlyreflected a lack of understanding that categories were based on thestatistical significance of differences from expected performance(national mean) rather than, as many clinicians believed, on anarbitrary decision.

As regards displays of multiple outcomes, the advantages wererecognized although some of the means of doing so were found tobe confusing, and there were concerns that clinicians might not goon to look at more detailed data on each outcome. Of the formatsavailable, tables with numbers (scorecards) were familiar and pro-vided exact scores and CIs. The only downside was comparisons ofproviders were difficult, and ranking (wanted by many clinicians)was precluded. Multiple bar charts were also familiar, but thenumber of providers that could be included was more limited thanwith scorecards. Most clinicians were not familiar with spider plots

(and none were familiar with 2D axes) and were concerned that thenumber of providers that could be included was very limited.

As regards content, common themes were the need for consis-tency, for explanations of formats and for clear labelling. Theimportance of indicating uncertainty about differences in scores byshowing statistical significance (CIs) was generally accepted,although clinicians felt the clinical significance of differences alsoneeded to be considered. Despite their acceptance of the impor-tance of statistical significance, many clinicians still paid attentionto the rank order of providers, despite most not differing signifi-cantly from the national mean.

While clinicians were aware of the effects that framing andscaling could have on perceptions of data comparisons, there wasno consensus as to which methods to use. As regards how to orderproviders, there was interest in having both alphabetical lists (foreasier identification of particular providers) and ranking. Therewas no consensus whether to rank by score or by statistical differ-ence from the national mean.

Most people wanted information on all providers in England.However, there was also strong support for data on the five localproviders supplemented with the national mean and the best andworst in the country. Despite the data having been risk adjusted,some clinicians wanted to compare themselves with what theyperceived as similar providers as regards size, population servedand facilities as a means of achieving ‘fair’ comparisons.

Limitations of the study

We selected trusts from those that had volunteered to take part inthe earlier POiS audit. Our sample may not, therefore, be repre-sentative of the views of all clinicians but be biased towardsthose who are more supportive of the use of PROMs.

It was not possible to control the number of attendees, dura-tion and room layout. This made some of the groups quite largeand, inevitably, not everyone participated in the discussions. Theheterogeneity of professions and grades that attended the meet-ings may have affected the views people were prepared toexpress (e.g. concern about the opinions of peers). It was reas-suring that, as regards the medical staff, junior doctors partici-pated as much as their senior colleagues. There was also noobvious reluctance by participants to admit to a lack of under-standing. Although non-medical clinicians attended, they madefew contributions such that the majority of the views expressedwere those of doctors. The method of data collection precludedthe possibility of identifying the profession and grade of partici-pants’ contributions, and therefore, comparisons between suchsub-groups were not feasible.

Implications

Overall, in many respects, clinicians’ views were similar to thoseof other groups (public, patients, students) that have been studiedpreviously [3]. This is surprising given the much more extensivehigher education doctors undergo, some of which includes train-ing in quantitative methods. The main differences were a greaterrecognition of the need to consider statistical uncertainty, even ifsome clinicians’ conceptual grasp was poor. This suggests thatexplanatory cues to aid understanding are just as necessary aswith other less highly educated groups. Another difference was



clinicians’ readiness to embrace new formats, which they recog-nized as having advantages over existing, familiar ones. Despitea pre-existing impression that clinicians were apprehensive aboutrank ordering (‘league tables’), there was a strong desire for suchcomparisons to be made available. In this regard, clinicians’views were consistent with those of other groups who also pre-ferred ordered data.

This study has provided sufficient indication of clinicians’ needsto determine the likeliest best format and content for the reportingof their performance in the National PROMs Programme inEngland (Box 1). Remaining uncertainties, such as the relativemerits of bar charts with CIs and caterpillar plots for displayingrank order and the use of full scales or restricted scales, willrequire further research to resolve.

References1. Davies, M., Powell, A. & Rushmer, R. (2007) Why don’t clinicians

engage with quality improvement? Journal of Health ServicesResearch and Policy, 12 (3), 129–130.

2. Fung, C. H., Lim, Y.-W., Mattke, S., Damberg, C. & Shekelle, P. G.(2008) Systematic review: the evidence that publishing patient careperformance data improves quality of care. Annals of Internal Medi-cine, 148 (2), 111–123.

3. Elting, L. S., Martin, C. G., Cantor, S. B. & Rubenstein, E. B. (1999)Influence of data display formats on physician investigators’ decisionsto stop clinical trials: prospective trial with repeated measures. BritishMedical Journal, 318, 1527–1531.

4. Marshall, T., Mohammed, M. A. & Rouse, A. (2004) A randomizedcontrolled trial of league tables and control charts as aids to healthservice decision-making. International Journal for Quality in HealthCare, 16 (4), 309–315. DOI: 10.1093/intqhc/mzh054.

5. Hawley, S. T., Zikmund-Fisher, B., Ubel, P., Jancovic, A., Lucas, T. &Fagerlin, A. (2008) The impact of the format of graphical presentationon health-related knowledge and treatment choices. Patient Educationand Counseling, 73 (3), 448–455. DOI: 10.1016/j.pec.2008.07.023.

6. Gerteis, M., Gerteis, J. S., Newman, D. & Koepke, C. (2007) Testingconsumers’ comprehension of quality measures using alternativereporting formats. Health Care Financing Review, 28 (3), 31–45.

7. Brown, P. E. (1992) The relationship between graphic aids in a busi-ness report and decision-making and cognitive style of a report reader.Delta Pi Epsilon Journal, 34, 63–76.

8. Speier, C. (2006) The influence of information presentation formats oncomplex task decision-making performance. International Journal ofHuman-Computer Studies, 64 (11), 1115–1131. DOI: 10.1016/j.ijhcs.2006.06.007.

9. Uhrig, J. D., Harris-Kojetin, L., Bann, C. & Kuo, T. M. (2006) Docontent and format affect older consumers’ use of comparative infor-mation in a Medicare health plan choice? Results from a controlledexperiment. Medical Care Research and Review, 63 (6), 701–718.DOI: 10.1177/1077558706293636.

10. Fasolo, B., Reutskaja, E., Dixon, A. & Boyce, T. (2010) Helpingpatient’s choose: how to improve the design of comparative scorecardsof hospital quality. Patient Education and Counseling, 78 (3), 344–349. DOI: 10.1016/j.pec.2010.01.009.

11. Hibbard, J. H., Slovic, P., Peters, E. & Finucane, M. L. (2002) Strat-egies for reporting health plan performance information to consumers:evidence from controlled studies. Health Services Research, 37 (2),291–313. DOI: 10.1111/1475-6773.024.

12. Hibbard, J. H., Peters, E., Slovic, P., Finucane, M. L. & Tusler, M.(2001) Making health care quality reports easier to use. The JointCommission Journal on Quality Improvement, 27 (11), 591–604.

13. Jarvenpaa, S. L. (1989) The effects of task demands and graphicalformat on information-processing strategies. Management Science, 35(3), 285–303. DOI: 10.1287/mnsc.35.3.285.

14. Knapp, P., Raynor, D. K. & Berry, D. C. (2004) Comparison of twomethods of presenting risk information to patients about the sideeffects of medicines. Quality & Safety in Health Care, 13 (3), 176–180. DOI: 10.1136/qshc.2003.009076.

15. Mazur, D. J., Hickam, D. H. & Mazur, M. D. (1999) How patients’preferences for risk information influence treatment choice in a case ofhigh risk and high therapeutic uncertainty: asymptomatic localizedprostate cancer. Medical Decision Making, 19 (4), 394–398. DOI:10.1177/0272989X9901900407.

16. Armstrong, K., FitzGerald, G., Schwartz, J. S. & Ubel, P. A. (2001)Using survival curve comparisons to inform patient decision making –can a practice exercise improve understanding? Journal of GeneralInternal Medicine, 16 (7), 482–485.

17. NHS Information Centre for Health and Social Care (2011) (Online)Available at: http://www.ic.nhs.uk/proms (last accessed 25 October2011).

Box 1 Recommended form and contents of brief report for clinicians

• Multiple outcomes – table with outcome score and confidence intervals plus coloured background (scorecard) ordered alphabetically within‘region’

• PROMs (disease-specific)� Patients mean change score (risk adjusted) – funnel plot and caterpillar plot� Proportion of patients achieving a minimally important difference (risk adjusted) – funnel plot and caterpillar plot

• Single transitional items� Proportion of patients who were ‘better’ or ‘much better’ after surgery (‘good news’) (risk adjusted) – funnel plot

• Post-operative problems� Proportion of patients reporting a problem (risk adjusted) – funnel plot

The charts should: be based on all providers in England; indicate the subject provider and the four nearest local providers; use consistent colourcoding for levels of certainty; be accompanied with clear explanatory text in plain English; and include clear labelling.



clinicians' views of formats of performance comparisons

Documents