delphi: myths and reality

16
Delphi: myths and reality Penelope M. Mullen Health Services Management Centre, University of Birmingham, Birmingham, UK Keywords Delphi method, Health care, Values, Forecasting Abstract The last 20 years have seen increasing interest in the use of Delphi in a wide range of health-care applications. However, this use has been accompanied by attempts to codify and define a “true Delphi”. Many authors take a narrow view of the purpose of Delphi and/or advocate a single prescriptive approach to the conduct of a Delphi study. However, as early as 1975, Linstone and Turoff pointed to the danger of attempting to define Delphi as one would immediately encounter a study that violated that definition. Through critical examination of some of the controversies and misunderstandings that surround Delphi, this paper aims to dispel some of the myths and demonstrates the wide scope and potential of this versatile approach. Introduction From its early applications in technological forecasting in the 1950s, Delphi developed rapidly in many different areas. Healthcare and medical applications have been identified since the 1960s and nursing applications from the 1970s. Despite this history, knowledge of Delphi appears to be uneven. Although hailed by some as “well-known”, popular and a “well-established management technique” (Beech, 1991, p. 208), others imply that Delphi is not widely known (Critcher and Gladstone, 1998; Phillips, 2000). Many authors have attempted to define the one “true” Delphi. Although often contradicting each other, many dismiss studies that deviate from their prescribed path as not being true Delphis. However, as far back as 1975 in their seminal book, Linstone and Turoff (1975, p. 3) stated that if they attempted to define Delphi “the reader would no doubt encounter at least one contribution to this collection which would violate our definition”. They noted that “there are many different views on which are the ‘proper’, ‘appropriate’, ‘best’, and/or ‘useful’ procedures for accomplishing the various specific aspects of Delphi”. There is a danger that over-prescription and narrow definition of Delphi will inhibit many valuable applications of this versatile technique. This paper explores some of the myths and reality surrounding Delphi and attempts to break away from the narrow prescriptive approaches. Delphi and its critics What is Delphi? Despite their misgivings, Linstone and Turoff (1975, p. 3) did offer an “underlying” definition: Delphi may be characterised as a method for structuring a group communication process so that the process is effective in allowing a group of individuals, as a whole, to deal with a complex problem. The Emerald Research Register for this journal is available at The current issue and full text archive of this journal is available at http://www.emeraldinsight.com/researchregister http://www.emeraldinsight.com/1477-7266.htm Delphi: myths and reality 37 Journal of Health Organization and Management Vol. 17 No. 1, 2003 pp. 37-52 q MCB UP Limited 1477-7266 DOI 10.1108/14777260310469319

Upload: penelope-m

Post on 14-Dec-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Delphi: myths and realityPenelope M. Mullen

Health Services Management Centre, University of Birmingham,Birmingham, UK

Keywords Delphi method, Health care, Values, Forecasting

Abstract The last 20 years have seen increasing interest in the use of Delphi in a wide range ofhealth-care applications. However, this use has been accompanied by attempts to codify and definea “true Delphi”. Many authors take a narrow view of the purpose of Delphi and/or advocate asingle prescriptive approach to the conduct of a Delphi study. However, as early as 1975, Linstoneand Turoff pointed to the danger of attempting to define Delphi as one would immediatelyencounter a study that violated that definition. Through critical examination of some of thecontroversies and misunderstandings that surround Delphi, this paper aims to dispel some of themyths and demonstrates the wide scope and potential of this versatile approach.

IntroductionFrom its early applications in technological forecasting in the 1950s, Delphideveloped rapidly in many different areas. Healthcare and medical applicationshave been identified since the 1960s and nursing applications from the 1970s.Despite this history, knowledge of Delphi appears to be uneven. Althoughhailed by some as “well-known”, popular and a “well-established managementtechnique” (Beech, 1991, p. 208), others imply that Delphi is not widely known(Critcher and Gladstone, 1998; Phillips, 2000).

Many authors have attempted to define the one “true” Delphi. Althoughoften contradicting each other, many dismiss studies that deviate from theirprescribed path as not being true Delphis. However, as far back as 1975 in theirseminal book, Linstone and Turoff (1975, p. 3) stated that if they attempted todefine Delphi “the reader would no doubt encounter at least one contribution tothis collection which would violate our definition”. They noted that “there aremany different views on which are the ‘proper’, ‘appropriate’, ‘best’, and/or‘useful’ procedures for accomplishing the various specific aspects of Delphi”.

There is a danger that over-prescription and narrow definition of Delphi willinhibit many valuable applications of this versatile technique. This paperexplores some of the myths and reality surrounding Delphi and attempts tobreak away from the narrow prescriptive approaches.

Delphi and its criticsWhat is Delphi?Despite their misgivings, Linstone and Turoff (1975, p. 3) did offer an“underlying” definition:

Delphi may be characterised as a method for structuring a group communication process sothat the process is effective in allowing a group of individuals, as a whole, to deal with acomplex problem.

The Emerald Research Register for this journal is available at The current issue and full text archive of this journal is available at

http://www.emeraldinsight.com/researchregister http://www.emeraldinsight.com/1477-7266.htm

Delphi: mythsand reality

37

Journal of Health Organization andManagement

Vol. 17 No. 1, 2003pp. 37-52

q MCB UP Limited1477-7266

DOI 10.1108/14777260310469319

They added that to “accomplish this ‘structured communication’ there isprovided: some feedback of individual contributions of information andknowledge; some assessment of the group judgement or view; someopportunity for individuals to revise views; and some degree of anonymityfor the individual responses”.

Delphi usually involves sending a questionnaire, which may be structured orrelatively unstructured, to the respondents, who are commonly termed an“expert panel”. The responses are collated and the original or a revisedquestionnaire is re-circulated, frequently accompanied by an anonymisedsummary of responses. Panellists are invited to confirm or to modify theirprevious response. This procedure is repeated for a pre-determined number ofrounds or until some pre-determined criterion has been fulfilled. Panellists mayalso be asked to give an explanation or justification for their response. Thus,Delphi typically involves a number of rounds, feedback of responses toparticipants between rounds, opportunity for participants to modify theirresponses, and anonymity of responses.

Although many early studies focused on long-term forecasting, since thenstudies have sought judgements, views, opinions and/or estimates in a widevariety of contexts.

TerminologyOver the years, many labels describing “types” of Delphi have been used.However, as shown in the list below, some labels relate to the type ofapplication, some to the method of “scoring” used and some just imply that theapproach is different:

. Delphi;

. classical Delphi;

. conventional Delphi;

. real-time Delphi;

. Delphi conference;

. policy Delphi;

. decision Delphi;

. historical Delphi;

. Delphi forecast;

. expert Delphi;

. ranking Delphi;

. goals Delphi;

. fuzzy Delphi;

. numerical Delphi;

. analytical Delphi method;

JHOM17,1

38

. quantitative Delphi;

. reactive Delphi;

. modified Delphi;

. Delphi variant;

. max-min Delphi;

. normative Delphi;

. exploratory Delphi;

. laboratory Delphi.

The way that a Delphi “study” is described also varies, frequently within thesame article, e.g.

. Delphi;

. the Delphi;

. (the) Delphi method;

. Delphi research;

. (the) Delphi process;

. (the) Delphi methodology;

. the Delphi approach;

. (the) Delphi technique;

. Delphi survey;

. Delphi concept;

. Delphi applications;

. the Delphi expert;

. consultation method;

. a Delphi inquiry;

. Delphi panels;

. the Delphi panel technique;

. the Delphi panel method;

. the Delphi survey technique;

. a Delphi consultation;

. Delphi investigation.

Critiques of DelphiThe early days saw major debates on epistemology, with a major focus ofcriticism being Delphi’s alleged failure to follow accepted scientific procedures,in particular, the lack of psychometric validity (Sackman, 1975). Defenders ofDelphi, however, argued that it deals with areas that do not lend themselves to

Delphi: mythsand reality

39

traditional scientific approaches. Helmer (1977, p. 18) argued that futuresanalysis, one of the major applications of Delphi, “is inevitably conducted in adomain of what might be called ‘soft data’ and ‘soft laws’ . . . Standardoperations-research techniques . . . have to be augmented by judgementalinformation”. Helmer (1977, pp. 18-19) further argued that Delphi “cannotlegitimately be attacked . . . for using mere opinions and for violating the rulesof random sampling in the ‘polling’ of experts”. Such criticisms, he argued, “. . .rest on a gross misunderstanding of what Delphi is . . . it should be pointed outthat a Delphi inquiry is not an opinion poll”.

Later Rieger (1986, p. 196) suggested: “Sackman’s article reveals theoperation of a Kuhnian paradigm and may be viewed as part of the widerdebate about quantitative v qualitative research, but it was not identified assuch at the time”. More recently, along similar lines Critcher and Gladstone(1998, p. 433) suggested that Delphi may suffer from its hybrid epistemologicalstatus:

While it can produce quantified results within a recognizably positivist tradition, thedefinition of the problem and the solutions to it by those who are the subjects of the researchplace it close to constructivist positions. Delphi straddles the divide between qualitative andquantitative methodologies.

Despite this recognition, it is apparent that many relatively recent criticisms ofDelphi and attempts to prescribe the correct approach stem from the positivistcritique.

Targets of criticism and sources of controversy and misunderstandinginclude the use of an “expert” panel, consensus, questionnaire construction,anonymity and interaction between panel members. These are discussedbelow.

The “expert” panelWho is an “expert”Sackman (1975) criticises the use of experts. “What is an ‘expert’ in the targetfield”, he asks (Sackman, 1975, p. 695), and “how are such experts operationallydefined?”, arguing that “It is almost impossible to find current psychometric orsocial science literature on ‘experts’”. (Sackman, 1975, p. 703) He cites the“pervasive expert halo effect” (Sackman, 1975, p. 704) and questions whetherresponses from experts will be significantly better than those from non-expertswho are “informed”.

Although experts are often assumed to be professionally or scientificallyqualified and/or to have achieved high status, an early study on the future ofcommunication services in the residential market used an “expert” panel ofhousewives (Linstone, 1978). Pill (1971) suggested that an “expert” should bedefined as anyone with a relevant input. Linstone (1978, p. 294) noted that “apolicy Delphi generally cannot be confined to so-called experts. Theformulation of national policy must obviously include the public at large . . .

JHOM17,1

40

it is therefore important to include in the Delphi representatives of a large orwide spectrum of vested interests, ranging from bureaucrats to minoritygroups”. He also reported a regional planning Delphi which included influentialcommunity leaders on the panel. More recently, a study on policy priorities forimproved diabetes care used patients on the panel alongside clinicians(Gallagher et al., 1996).

According to Cantrill et al. (1996, p. 69), some commentators argue “that thedefinition [of an expert] should include any individual with relevant knowledgeand experience of a particular topic, including patients and carers”.

Some studies ask “experts” to self-rate their expertise in the area or theirconfidence in their responses, for example, weighting their expertise on eachquestion on a 0-10 scale (Ishikawa et al., 1993), describing their knowledge ofeach area as being derived from “awareness”, “reading” or “working”(Bender et al., 1969) or evaluating their familiarity with each item as fair,good or excellent (Linstone, 1978). Such ratings may be used to weightresponses or as filters to determine inclusion in subsequent rounds.However, the efficacy of such self-rating is disputed, not least because“different people have very different ways of rating their own expertise”(Pill, 1971, p. 62).

Size of panel, response rate and attritionAlthough many pioneering Delphis used very small panels, Linstone (1978,p. 274) reported panels of varying sizes up to the low hundreds, also noting aJapanese Delphi “involving several thousand people”. Linstone (1978, p. 296)findings that “a suitable minimum panel size is seven” with accuracydeteriorating rapidly with smaller sizes and improving more slowly with largenumbers.

For Cavalli-Sforza and Ortolano (1984, p. 325) a “typical Delphi panel hasabout 8 to 12 members” while for Phillips (2000, p. 193) “[t]he optimum size ofthe panel is seven to twelve members”. For a policy Delphi, Turoff (1970, p. 153)suggests “anywhere from ten to fifty people”. In contrast, Wild and Torgersen(2000) suggest panel sizes of 300-500 are usually considered sufficient. Cantrillet al. (1996, p. 69) report that “published studies in health applications haveused panel sizes varying from 4 to 3000” and recommend that size “should begoverned by the purpose of the investigation” – a very important point sincemany of the criticisms of small panels appear to confuse Delphis withconventional quantitative surveys.

Concerns have been expressed about bias resulting from low response ratesand high attrition rates (drop-out rates between rounds). Walker and Selfe(1996) note response rates from, as they express it, an unacceptable 8 per cent toan excellent 100 per cent. Walker and Selfe (1996, p. 679) that, in order tomaintain rigour, “a 70% minimum response rate should be achieved” but offerlittle support for this claim. However, as Reid (1988) notes, evidence from most

Delphi: mythsand reality

41

studies is that the larger the panel the higher the drop-out rate – with panels of20 tending to keep their members.

Fears of self-selection bias have been expressed but may prove unfounded.McKee et al. (1991) report a study which found no significant differencebetween the characteristics of medical consultants willing to take part in anexpert panel and those who were not.

Selection of the panelAccording to Reid (1988, p. 245), the “criticism [of Delphi] that does standscrutiny is the danger inherent in the selection of the panel” with Williams andWebb (1994, p. 182) noting that only one of 13 studies examined by Reid“actually selected a random sample”. Beech (1999, p. 283) observes that,because experts are selected or nominated, there is an “absence of the usualrepresentative sampling techniques”. Reid (1988, p. 245) claims that“introduction of some basic sampling techniques with follow-ups of non-respondents would be a worthwhile innovation where the population of expertsis genuinely large”.

However, Beretta (1996, p. 83) points out that “Representative samplingtechniques may be inappropriate when expert opinions are required”.Goodman (1987, p. 730) notes that the originators of Delphi “tend not toadvocate a random sample of panellists . . . instead the use of experts or at leastinformed advocates is recommended”, especially in forecasting. Early onHelmer (1977, pp. 18-19) argued that “it should be pointed out that a Delphiinquiry is not an opinion poll, relying on drawing a random sample from ‘thepopulation of experts’; rather, once a set of experts has been selected(regardless of how), it provides a communication device for them, that uses theconductor of the exercise as a filter in order to preserve anonymity ofresponses”.

Some applications require panels covering a wide range of interests anddisciplinary viewpoints. Indeed, Linstone and Turoff (1975, p. 4) listapplications where the “heterogeneity of the participants must be preservedto assure validity of the results” as leading to the need to employ Delphi. Pill(1971, p. 62) suggests that “many innovations and real breakthroughs . . . occurfrom outside a discipline or specialty”, adding further that “one asset of the useof a group is the diversity of opinion they bring to bear thus minimising thepossibility of overlooking some obvious facet of a question”.

Consensus and questionnaire designConsensusIt is a common conception that a defining objective of Delphi is to achieveconsensus. Dalkey and Helmer (1963, p. 458) stated that “Its [Delphi’s] object isto obtain the most reliable consensus of opinion of a group of experts”.Lindeman (1975, p. 435) stated that “The Delphi technique . . . involves the use

JHOM17,1

42

of a series of questionnaires designed to produce group consensus”. Morerecently, Phillips (2000, p. 192) suggested that “[t]he Delphi technique is amethod for obtaining consensus of informed opinion by soliciting the views ofexperts in the specific field being studied”. Parston (1995, p. 4) claims thatDelphi “is designed to force consensus”.

However, not all Delphis achieve, or even seek, a consensus. Turoff (1970,p. 153) suggested that the goal of a policy Delphi may be “to establish all thediffering positions advocated and the principal pro and con arguments forthose positions”. Linstone (1978) observed that we are often more concernedwith determining the degree of polarization of respondents than consensus.More recently Walker and Selfe (1996, p. 680) noted that “[a]lthough the Delphimethod was originally developed in order to gain consensus, this may not beachieved and in some cases may not be required”.

Rather than aiming to achieve consensus, Jones and Hunter (1995, p. 376)claim that consensus methods aim “to determine the extent to which experts orlay people agree about a given issue”. Xiao et al. (1997, p. 208) agree, noting that“[t]he Delphi method is a means of determining the extent to which a consensusexists amongst a group of people”.

Critcher and Galdstone (1998, p. 432) suggest that the intended outcome of aDelphi “may include any or all of the following: identifying the degree ofconsensus or dissensus, specifying the range of different positions, andrevealing the rationales which lie behind the judgements”.

Where achieving consensus is a primary aim, there is a danger that possiblyimportant variations in views will be concealed – for example, a “risk thatextreme opinions will be masked by the statistical analysis” (Rudy 1996, p. 19).Scheibe et al. (1975, p. 277) warn that consensus measures, such as a narrowinterquartile range, “do not take full advantage of the information available inthe distributions”. A bimodal distribution, for example, will not register asconsensus but may indicate “an important and insoluble cleft of opinion” or adistribution may flatten out with no peak at all. “The results of the Delphi areno less important for this”. Indeed, they continue, “considering that there is astrong natural tendency in the Delphi for opinion to centralize, resistance in theform of unconsensual distribution should be viewed with special interest”.

This “natural tendency” could indeed result in a false or a forced consensus.As Woudenberg (1991, p. 145) notes, there are “indications that group pressureto conformity is very strong in a Delphi, this makes consensus in a Delphisuspect and no way related to genuine agreement”. Linstone and Turoff (1975)note that one of the common reasons for failure in a Delphi is ignoring and notexploring disagreements. If dissenters drop out then there is artificialconsensus.

Whether or not a consensus should even be sought lies in the purpose of theDelphi. With positive questions, the aim is to find the correct answer, whetheror not it is an outlier, rather than a unanimously agreed wrong answer. Hence

Delphi: mythsand reality

43

the importance of exploring disagreements as the outlier might be correct.However, where the aim is to obtain normative views, seeking consensus mightwell be appropriate

First round and formulation of the “questionnaire”According to some commentators, the first round must be open-ended invitingthe panellists to identify, as appropriate to the study, issues, forecasts andviews etc. For example, Bender et al. (1969) asked panellists to list importantdiscoveries, changes, events etc. which they thought might happen in the next50 years. Bramwell and Hykawy (1999, p. 49) asked for ten predictions“regarding the future of nursing education in the next 50 years”. Lindeman(1975) used round one to identify “burning issues” related to clinical nursingresearch, and Moscovice et al. (1988) asked panellists for problems, relating tohealth and healthcare, facing citizens of Washington State. The “items” thusidentified are then synthesised by the researcher or the monitor team into astructured questionnaire, which is circulated to the panel in round two.

Judge and Podgor (1983, p. 400), in their study using Delphi in a citizenparticipation project, conclude that allowing “respondents to make the firstround choices without a seed list . . . will assist in developing a set of choicesmore representative of the wants of the participants”. Indeed, an open-endedfirst round has been used as a criterion for judging whether a study is well-conducted (Rieger, 1986). Despite this, in many Delphi studies the initialquestionnaire is developed by the researcher, often after an extensive literaturereview, or by a sub-panel of “experts” or a monitor team. For example, Xiaoet al. (1997) drew on a literature review to identify 39 potential influencingfactors on length of stay and then consulted five experts before sending thequestionnaire to the main panellists in round one.

Some commentators are prescriptive. For example, Evans (1997, p. 123)states that “[i]nitially, a questionnaire is developed by the Delphi investigator,usually with the assistance of an outside expert”. Charlton et al. (1981, p. 288)assert that “[a] Delphi study begins when a small monitor team designs aquestionnaire for larger respondent groups” and Wild and Torgersen (2000,p. 114) state that “working groups formulate the statements for thequestionnaire”.

Questionnaire designAlthough some Delphis have been criticised for poor questionnaireconstruction, similar criticism could equally be directed at poorly designedquestionnaires used in conventional surveys. Goldschmidt (1975, pp. 198-9)notes there is nothing in the characteristics of Delphi which suggests that “theimplementation of the procedures should not conform to professional standardsfor questionnaire design”. However, Delphi does give the researcher or monitorteam an unusual degree of power. Linstone and Turoff (1975) identifying the

JHOM17,1

44

imposition of monitor views and exclusion of contributions as common reasonsfor failure, stressed the importance of the honesty of the monitor team.

Virtually all Delphis use self-completion questionnaires, sometimes in anelectronic format. However, Oranga and Nordberg (1993) overcame problems oflow levels of education and literacy in rural Kenya by using interviewers toassist panel members to complete the successive questionnaires.

Scoring methodsEarly futures studies asked panellists to indicate the probability of eventsoccurring by a particular date or the date by which they were, say, 50 per centor 90 per cent certain that the event would occur. Some also sought estimates ofthe impact those events would have. Subsequent Delphis have soughtinformation about the value, importance, priority, urgency, desirability,feasibility, probability of success etc. of the items presented.

Despite attempts at prescription, many different approaches to scoring havebeen used. Beech (1999) reports a seven-point scale used to indicate “likelihoodof occurrence” of developments in mental health. Bramwell and Hykawy (1999)asked panellists to predict in which (broad) time period 38 “events” wouldoccur. Robinson (1991) reports that respondents were asked to mark aprojection on a graph, give their judgement of the reliability of this forecast,assumptions and uncertainties and, in a later round, to re-estimate the trend,assess reliability on a three-point scale and rate the validity of eachassumption. Loos et al. (1985) report a US study in which respondents wereasked to give their prognosis of future federal funding for MCH programmes(in four categories) and their preferred priority for funding (in three categories).Another study asked respondents to estimate the probability of occurrence ofdevelopments in Mental Health (on a three-point scale) and their potentialconsequences for the prevalence of dementia (þþ ¼ strong increase,þ ¼ increase, 0 ¼ no or little change, 2 ¼ decrease, 22 ¼strong decrease)(Bijl, 1992).

Campbell et al. (1999) asked respondents to rate the validity of primary carequality indicators on a one- to nine-point scale. Lindeman (1975) askedrespondents to indicate (yes/no) whether nursing should assume primaryresearch responsibility for each area and to score, on one-to-seven scales, boththe importance of the area and the likelihood of a resulting change in patientwelfare. Xiao et al. (1997) used interviews to ask respondents to rank 39 itemsand to rate, on a one-to-five scale, their effect on length of stay. In a studyseeking to improve the pre-registration midwifery curriculum (Fraser 1999),respondents were asked their agreement/disagreement with 32 problem areasidentified earlier, and then to “rank” each item from 1 (not important) to 5 (veryimportant).

In another study, participants were asked, given existing expenditurepatterns, to allocate £1,000 between services (Charlton et al., 1981). Moscovice

Delphi: mythsand reality

45

et al. (1988) asked respondents to indicate the ten most important and five leastimportant of 29 potential health priority items and, in a later round, to assignscores from one to ten to a reduced list of 23 items.

Despite debate elsewhere about problems relating to the validity of scoringmethods and aggregation over respondents (Mullen and Spurgeon, 2000), fewDelphis consider this aspect problematic. However, some studies haveexplicitly compared Delphi scoring methods; for example, Scheibe et al. (1975)tested ranking, rating and paired-comparison approaches and Mullen (1983)compared results using multiple-vote and budget-pie methods.

Number of roundsFeed-back to respondents and the opportunity to revise earlier responses arearguably defining features of Delphi. Such provision obviously requires at leasttwo rounds. Beyond that the number of rounds required is disputed andindividual studies have been found using two, three, four and even five rounds.Turoff (1970, p. 161) suggests that a “policy Delphi requires at least four to fiverounds as opposed to the two or three that are usually sufficient for thetechnological type Delphi”. Others more prescriptively refer, for example, to“the need to conduct four rounds” (Rudy, 1996, p. 19). Sumsion (1998, p. 153)states that “the classic Delphi technique had four rounds” but that “currentconsensus appears to be that either two or three rounds are preferred”. Walkerand Selfe (1996, p. 679) suggest that, as “repeated rounds may lead to fatigue byrespondents and increased attrition”, most studies use only two or threerounds.

Many commentators suggest that Delphi rounds should be continued untilconsensus is achieved. However, Scheibe et al. (1975, pp. 277-8) proposed, inplace of consensus as a stopping criterion, the “stability of the respondents’vote distribution curve over successive rounds of the Delphi”, which, theyconcluded (Scheibe et al. , 1975, pp. 280-1), “preserves any well-defineddisagreements that may exist”.

Although a minimum of two rounds (or three when round one is open-ended)is needed to allow feedback and revision of responses, a number of recenthealth-related studies (e.g. Gallagher et al. , 1996; Butterworth and Bishop,1995) appear to employ an open-ended first round with only one further round,thus giving no opportunity for respondents to reconsider or modify theirresponses. Other practices, which appear to preclude opportunities formodifying responses, are using different panels or, more commonly, completelydifferent questionnaires in successive rounds.

Anonymity and interaction between respondentsAnonymityAnother defining feature of Delphi, and one of its claimed strengths, isanonymity. Anonymity removes effects of status, powerful personalities and

JHOM17,1

46

group pressure which can arise in meetings. Charlton et al. (1981, p. 288)suggest anonymity means that “at no time need respondents feel compelled tocompromise their views as they might in a committee meeting”. Moscovice et al.(1988) point to the need to preserve anonymity to prevent domination by asmall group. Anonymity allows “honest expression of views without theintimidation, inhibition or peer-pressure factors..”. (Rudy, 1996, p. 19) and“reappraisal of a viewpoint without loss of face” (Sumsion, 1998, p. 154).

However, anonymity is also seen as a weakness. Sackman (1975, p. 712)argues that “under ‘no disclosure of names’ anonymity, no individual isaccountable for any of his own responses or for group Delphi results . . . Delphiembodies circular buck-passing”. Other disadvantages are that anonymitycould limit “the extent to which exploratory thinking is possible” (Bowles, 1999,p. 32) and remove “the stimulation and spawning of ideas” (Rudy, 1996, p. 19).

But what does anonymity consist of? Very rigid and extensive constructionsof anonymity have been suggested, for example, requiring that panel membersare unknown to each other (Robinson, 1991; Saito and Sinha, 1991) and eventhat responses are anonymous to researchers (Sumsion, 1998; Fraser, 1999).While both might have a place in some studies (although the latter precludespersonalised feedback and challenges to outliers), probably the “essentialanonymity” is that responses are anonymous to other panel members (i.e.panellists do not know who made which response).

However, does Delphi require that such anonymity be preserved throughoutthe study? Delphis have been recorded which have a face-to-face meeting inplace of the final round or, more controversially, which start with such ameeting. It might thus be argued that Delphi requires only that the anonymityof responses be preserved for at least part of the study.

Analysis and format of feedbackFeedback is an important feature of Delphi. Apart from the feedback ofjustifications (discussed below), most feedback is numerical or statistical, withsome form of aggregated group response. However, non-quantitative feedbackof results can be very extensive. Lindeman (1975) reports the feedback of a79-page minority report of round three comments. Statistical feedback oftenuses medians, usually accompanied by minima and maxima, quartiles and/orthe inter-quartile range. Some studies use means, often accompanied by thestandard deviation and/or range. Phillips (2000) reports feedback of modalresponses. Box-and-whisker diagrams have also been used.

Many Delphis use more detailed feedback in the form of frequencydistributions (both numerical and graphical) (Scheibe et al. , 1975; McKee et al.,1991; Xiao et al., 1997). Indeed, according to McKenna (1994, p. 1222) the “use offrequency distributions to identify patterns of agreement” is a keycharacteristic of Delphi. The advantages of distributions are that noinformation is lost, bimodal distributions can be identified, the existence of

Delphi: mythsand reality

47

extreme outliers (who may nevertheless have important views) can beidentified, and opposing views are not simply “averaged” (Mullen andSpurgeon, 2000, p. 86).

Feedback, however, is not without its dangers. Scheibe et al. (1975) tested theeffect of giving false feedback and, although over subsequent rounds ratingsdid return towards the original (true) response, some residual effect of the falsefeedback remained. Crisp et al. (1999, p. 36) suggest that “Researchers need todiscuss the role of feedback in the context of a growing debate surrounding theinfluence feedback exerts on participants’ decisions and its value”.

Explanation/justificationSince the pioneering Delphis, it has been common for panellists, usually the“outliers” in later rounds, to be asked to argue, justify and/or providesupporting evidence for their responses. Such material is commonly fed back toall panellists in the next round, but in some studies it is withheld as a matter ofpolicy or simply because there are no more rounds.

As noted earlier, Pill (1971, p. 62) stresses that using a group, rather than anindividual, allows “diversity of opinion . . . minimising the possibility ofoverlooking some obvious facet of a question”. He adds that this “property isenhanced by allowing highly divergent viewpoints to state the reasons for theireccentricity” and further cites a study which suggests that the “predictions ofthose individuals giving substantive reasons . . . [were] . . . better than thosewhose reasons were tautological or non-existent”.

However, some Delphis appear not to solicit such explanations fromrespondents, leading Sumsion (1998, p. 154) to conclude that “because there isno opportunity to interact with the participants, the researcher does not knowthe rationale behind their responses” and there “is also no opportunity forrespondents to elaborate on their views”.

Discussion and conclusionsUsing DelphiDelphi typically involves a number of rounds, feedback of responses toparticipants between rounds, opportunity for participants to modify theirresponses, and anonymity of responses. Beyond that it can take a variety offorms, with factors such as panel size, composition and selection, questionnairedesign, number of rounds, the form of feed-back, and the treatment ofconsensus, being determined by the requirements of specific applications.However, whatever the application, good research practice should be followeddrawing, as appropriate, on quantitative and qualitative methodologies. Theconsiderable power that Delphi affords the researcher or monitor team servesto reinforce the importance of adhering to good – and appropriate – researchpractice. However, it can be unhelpful to judge the validity of a particular

JHOM17,1

48

Delphi from a research paradigm which is irrelevant to that particularapplication.

Delphi is used as an alternative to conventional meetings to avoid problemsarising from powerful personalities, group pressure and the effects of status. Itsuse is also advocated where attending a meeting would be difficult orexpensive and also because it “allows the utilisation of larger numbers ofpeople than can be effectively employed by the committee approach” (Turoff,1970, p. 153). However, if Delphi is substituted for a meeting primarily ongrounds of cost and/or logistics, maintaining anonymity would not appearnecessary.

Delphi used as an alternative to conventional surveys allows greaterinteraction with respondents via feedback and justifications, and permitsrespondents to reconsider and modify their responses. Further, where roundone is open-ended, respondents have a greater role in “setting the agenda” thanis possible with conventional surveys.

Health service applicationsMedical and health service applications have included forecastingdevelopments in medicine and health technology, forecasting changes indisease patterns and forecasting future funding patterns. Delphi has beenemployed extensively to help identify priorities for nursing research and alsopriorities for spending and service developments. It has assisted in thedevelopment of clinical guidelines and nursing practice guidelines. Delphipanels have successfully included patients and other lay people.

ConclusionsWith the large number of good studies reported and several, sometimesexcellent, commentaries/critiques of Delphi, why are there still so manymisconceptions about the approach and attempts to be over-prescriptive? Onereason must lie in Linstone and Turoff’s (1975, p. 6) perceptive warning of theproblems which arise “when a Delphi designed for a particular application istaken as representative of all Delphis”.

Another problem arises from the tendency to confine searches to oneparticular application area, for example nursing research, which can inhibitwider dissemination of the potential of Delphi and prevent the cross-fertilisation of ideas.

In common with many studies, Delphi reports often give very littleinformation as to the actual scoring, aggregation and feedback methodsemployed (Mullen and Spurgeon, 2000, pp. 116-17). Greater clarity and detailwould assist in dissemination of practice and prevent unnecessary re-inventionof the wheel.

Delphi has enormous potential in many areas of health services research andpractice. These range from the more traditional forecasting and futures studies,through priority setting, to areas of consumer and patient involvement.

Delphi: mythsand reality

49

However, in order to realise this potential it is essential to avoid over-restrictivenarrow prescriptions of Delphi.

References

Beech, B.F. (1991), “Changes: the Delphi technique adapted for classroom evaluation of clinicalplacements”, Nurse Education Today, Vol. 11, pp. 207-12.

Beech, B. (1999), “Go the extra mile - use the Delphi technique”, Journal of Nursing Management,Vol. 7 No. 5, pp. 281-8.

Bender, A.D., Stract, A.E., Ebright, G.W. and von Haunalter, G. (1969), “Delphic study examinesdevelopments in medicine”, Futures, Vol. 1, pp. 289-303.

Beretta, R. (1996), “A critical review of the Delphi technique”, Nurse Researcher, Vol. 3 No. 4,pp. 79-89.

Bijl, R. (1992), “Delphi in a future scenario study on mental health and mental health care”,Future, Vol. 24, pp. 232-50.

Bowles, N. (1999), “The Delphi technique”, Nursing Standard, Vol. 13 No. 45, pp. 32-6.

Bramwell, L. and Hykawy, E. (1999), “The Delphi technique: a possible tool for predicting futureevents in nursing education”, Canadian Journal of Nursing Research, Vol. 30 No. 4,pp. 47-58, reprint of 1974 article.

Butterworth, T. and Bishop, V. (1995), “Identifying the characteristics of optimum practice:findings from a survey of practice experts in nursing, midwifery and health visiting”,Journal of Advanced Nursing, Vol. 22, pp. 24-32.

Campbell, S.M., Hann, M., Roland, M.O., Quayle, J.A. and Shekelle, P.G. (1999), “The effect ofpanel membership and feedback on ratings in a two-round Delphi survey – results of arandomized controlled trial”, Medical Care, Vol. 37 No. 9, pp. 964-8.

Cantrill, J.A., Sibbald, B. and Buetow, S. (1996), “The Delphi and nominal group techniques inhealth services research”, International Journal of Pharmacy Practice, Vol. 4 No. 2,pp. 67-74.

Cavalli-Sforza, V. and Ortolano, L. (1984), “Delphi forecasts of land-use – transportationinteractions”, Journal of Transportation Engineering, Vol. 110 No. 3, pp. 324-39.

Charlton, J.R.H., Patrick, D.L., Matthews, G. andWest, P.A. (1981), “Spending priorities in Kent: aDelphi study”, Journal of Epidemiology and Community Health, Vol. 35 No. 4, pp. 288-92.

Crisp, J., Pelletier, D., Duffield, C., Nagy, S. and Adams, A. (1999), “It’s all in a name: when is a‘Delphi study’ not a Delphi study?”, Australian Journal of Advanced Nursing, Vol. 16 No. 3,pp. 32-7.

Critcher, C. and Gladstone, B. (1998), “Utilizing the Delphi technique in policy discussion: a casestudy of a privatized utility in Britain”, Public Administration, Vol. 76 No. 3, pp. 431-50.

Dalkey, N. and Helmer, O. (1963), “An experimental application of the Delphi method to the use ofexperts”, Management Science, Vol. 9, pp. 458-67.

Evans, C. (1997), “The use of consensus methods and expert panels in pharmacoeconomic studies– practical applications and methodological shortcomings”, Pharmacoeconomics, Vol. 12No. 2.1, pp. 121-9.

Fraser, D.M. (1999), “Delphi technique: one cycle of an action research project to improve the pre-registration midwifery curriculum”, Nurse Education Today, Vol. 19, pp. 495-501.

Gallagher, M., Branshaw, C. and Nattress, H. (1996), “Policy priorities in diabetes care: a Delphistudy”, Quality in Health Care, Vol. 5, pp. 3-8.

JHOM17,1

50

Goldschmidt, P.G. (1975), “Scientific inquiry or political critique?”, Technological Forecasting andSocial Change, Vol. 7, pp. 195-213.

Goodman, C.M. (1987), “The Delphi technique: a critique”, Journal of Advanced Nursing, Vol. 12,pp. 729-34.

Helmer, O. (1977), “Problems in futures research: Delphi and causal cross-impact analysis”,Futures, Vol. 9, pp. 17-31.

Ishikawa, A., Amagasa, M., Shiga, T., Tomizawa, G., Tatsuta, R. and Mieno, H. (1993), “The max-min Delphi method and fuzzy Delphi method via fuzzy integration”, Fuzzy Sets andSystems, Vol. 55 No. 3, pp. 241-53.

Jones, J. and Hunter, D. (1995), “Consensus methods for medical and health-services research”,British Medical Journal, Vol. 311 No. 7001, pp. 376-80.

Judge, R.M. and Podgor, J.E. (1983), “Use of the Delphi in a citizen participation project”,Environmental Management, Vol. 7 No. 5, pp. 399-400.

Lindeman, C.A. (1975), “Delphi survey of priorities in clinical nursing research”, NursingResearch, Vol. 24 No. 6, pp. 434-41.

Linstone, H.A. (1978), “The Delphi technique”, in Fowles, R.B. (Ed.), Handbook of FuturesResearch, Greenwood, Westport, CT, pp. 271-300.

Linstone, H.A. and Turoff, M. (1975), “Introduction to the Delphi method: techniques andapplications”, in Linstone, H.A. and Turoff, M. (Eds), The Delphi Method: Techniques andApplications, Addison-Wesley Publishing Company, Reading, MA, pp. 3-12.

Loos, G.P., Smith, R.G. and Roseman, C. (1985), “Probable future funding priorities in maternaland child health: a modified Delphi national survey”, Journal of Health Politics, Policy andLaw, Vol. 9 No. 4, pp. 683-93.

McKee, M., Priest, P., Ginzler, M. and Black, N. (1991), “How representative are members ofexpert panels?”, Quality Assurance in Health Care, Vol. 3, pp. 89-94.

McKenna, H.P. (1994), “The Delphi technique: a worthwhile research approach for nursing?”,Journal of Advanced Nursing, Vol. 19, pp. 1221-5.

Moscovice, I., Armstrong, R., Shortell, S. and Bennett, R. (1988), “Health services research fordecision-makers: the use of the Delphi technique to determine health priorities”, Journal ofHealth Politics, Policy Law, Vol. 2 No. 3, pp. 388-410.

Mullen, P.M. (1983), “Delphi-type studies in the health services: the impact of the scoringsystem”, HSMC, University of Birmingham, Research Report 17.

Mullen, P.M. and Spurgeon, P. (2000), Priority Setting and the Public, Radcliffe Medical Press,Abingdon.

Oranga, H.M. and Nordberg, E. (1993), “The Delphi panel method for generating healthinformation”, Health Policy and Planning, Vol. 8 No. 4, pp. 405-12.

Parston, G. (1995) “Introduction”, The Future of Public Services – 2007, Office for PublicManagement.

Phillips, R. (2000), “New applications for the Delphi technique”, Annual “San Diego” Pfeiffer andCompany, Vol. 2, pp. 191-196

Pill, J. (1971), “The Delphi method: substance, context, a critique and an annotated bibliography”,Socio-economic Planning Science, Vol. 5 No. 1, pp. 57-71.

Reid, N. (1988), “The Delphi technique: its contribution to the evaluation of professional practice”,in Ellis, R. (Ed.), Professional Competence and Quality Assurance in the Caring Professions,Chapman & Hall, London, pp. 230-62.

Delphi: mythsand reality

51

Rieger, W.G. (1986), “Directions in Delphi developments: dissertations and their quality”,Technological Forecasting and Social Change, Vol. 29 No. 2, pp. 195-204.

Robinson, J.B.L. (1991), “Delphi methodology for economic-impact assessment”, Journal ofTransportation Engineering, Vol. 117 No. 3, pp. 335-49.

Rudy, S.F. (1996), “A review of Delphi surveys conducted to establish research priorities byspecialty nursing organizations from 1985 to 1995”, ORL Head and Neck Nursing, Vol. 14No. 2, pp. 16-24.

Sackman, H. (1975), “Summary evaluation of Delphi”, Policy Analysis, Vol. 1 No. 4, pp. 693-718.

Saito, M. and Sinha, K.C. (1991), “Delphi study on bridge condition rating and effects ofimprovements”, Journal of Transportation Engineering, Vol. 117 No. 3, pp. 320-34.

Scheibe, M., Skutsch, M. and Schofer, J. (1975), “Experiments in Delphi methodology”, inLinstone, H.A. and Turoff, M. (Eds), The Delphi Method: Techniques and Applications,Addison-Wesley Publishing Company, Reading, MA, pp. 262-87.

Sumsion, T. (1998), “The Delphi technique: an adaptive research tool”, British Journal ofOccupational Therapy, Vol. 61 No. 4, pp. 153-6.

Turoff, M. (1970), “The design of a policy Delphi”, Technological Forecasting and Social Change,Vol. 2 No. 2, pp. 149-71.

Walker, A.M. and Selfe, J. (1996), “The Delphi method: a useful tool for the allied healthresearcher”, British Journal of Therapy and Rehabilitation, Vol. 3 No. 12, pp. 677-80.

Wild, C. and Torgersen, H. (2000), “Foresight in medicine: lessons from three European Delphistudies”, European Journal of Public Health, Vol. 10 No. 2, pp. 114-9.

Williams, P.L. andWebb, C. (1994), “The Delphi technique: a methodological discussion”, Journalof Advanced Nursing, Vol. 19, pp. 180-6.

Woudenberg, F. (1991), “An evaluation of Delphi”, Technological Forecasting and Social Change,Vol. 40, pp. 131-50.

Xiao, J., Douglas, D., Lee, A.H. and Vemuri, S.R. (1997), “A Delphi evaluation of the factorsinfluencing length of stay in Australian hospitals”, International Journal of HealthPlanning and Management, Vol. 12 No. 3, pp. 207-18.

JHOM17,1

52