methods of formal consensus in classification/diagnostic criteria and guideline development

11
Methods of Formal Consensus in Classification/Diagnostic Criteria and Guideline Development Raj Nair, MD,* ,1 Rohit Aggarwal, MD, MS, †,1 and Dinesh Khanna, MD, MSc Guidelines or diagnostic criteria in clinical practice assist physicians in their clinical decision- making and improve health outcomes for patients. Diagnostic and classification criteria should be based on evidence from rigorously conducted controlled studies. Formal group consensus methods have been developed to organize subjective judgments and to synthesize them with the available evidence. This review discusses 4 types of formal consensus methods used in the health field and their applications in rheumatology: the Delphi method, Nominal Group Technique, RAND/ UCLA Appropriateness Method, and National Institutes of Health consensus development conference. © 2011 Elsevier Inc. All rights reserved. Semin Arthritis Rheum 41:95-105 Keywords: guideline criteria, formal consensus methods, Delphi method, nominal group technique, RAND/UCLA Appropriateness Method, National Institutes of Health consensus development, confer- ence, GRADE, diagnostic criteria P hysicians are faced with diagnostic and treatment dilemmas on a daily basis in their clinic setting. Well-developed diagnostic or treatment guide- lines in clinical practice summarize data for physicians to improve their clinical decision-making for an indi- vidual patient (1). Evidence-based guidelines derived from rigorously conducted controlled studies outper- form expert opinion; however, in practice there are instances where sufficient research-based evidence does not exist (2). Various group consensus methods incor- porate opinions of a group of experts rather than an individual expert in a formalized manner (3,4). Given the diversity of opinion that can occur with the diag- nosis and management of disease, formal group con- sensus methods organize subjective judgments and provide guidance when there is no existing evidence. Formal group consensus methods allow for inclusion of a wide range of knowledge and experience, and inter- action between members, while stimulating construc- tive debate and preventing influential behavior of 1 opinion to formulate suggestions about a specific ques- tion. Since perfect agreement is seldom reached, the objective of the consensus methodology is to identify a central tendency among the group and grade the level of agreement reached. Consensus methods are increas- ingly being used to develop diagnostic, classification, therapeutic guidelines and response criteria sets in var- ious fields of medicine including rheumatology (5,6). Our review focuses on details of the consensus meth- odologies along with the relative advantages and disad- vantages of each method. There are other formal quan- titative methods to combine information from the literature such as meta-analysis and systematic review and the purpose of this review is to concentrate on consensus when other quantitative methods are not possible either due to paucity or conflict in the literature. DEFINING CONSENSUS Consensus does not require full agreement among partic- ipants (7). A prespecified range is decided by the group running or leading the consensus methodology, accord- ing to the needs of the task to be performed. There are several methods for defining consensus with examples be- low (8): *Department of Medicine, Washington Hospital Center, Washington, DC. †Department of Medicine, University of Pittsburgh, Pittsburgh, PA. ‡Department of Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA. 1 Both authors contributed equally to the manuscript. Funding Source: No funding was obtained for the study design, collection, analysis, and interpretation of data; in the writing of the manuscript; and in the decision to submit the manuscript for publication. Dr. Khanna was supported by a National Institutes of Health Award (NIAMS K23 AR053858-04). None of the authors have competing interests related to the content of this manuscript. Address reprint requests to: Dinesh Khanna, MD, MSc, Division of Rheumatology, Department of Medicine, David Geffen School of Medicine, 1000 Veteran Avenue, Rm 32-59, Rehabilitation Building, Los Angeles, CA 90095. E-mail: dkhanna@ mednet.ucla.edu. METHODOLOGY 95 0049-0172/11/$-see front matter © 2011 Elsevier Inc. All rights reserved. doi:10.1016/j.semarthrit.2010.12.001

Upload: raj-nair

Post on 12-Sep-2016

216 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Methods of Formal Consensus in Classification/Diagnostic Criteria and Guideline Development

ltvffinpitnsp

asI

m

DRm

METHODOLOGY

Methods of Formal Consensus in Classification/DiagnosticCriteria and Guideline Development

Raj Nair, MD,*,1 Rohit Aggarwal, MD, MS,†,1 and Dinesh Khanna, MD, MSc‡

Guidelines or diagnostic criteria in clinical practice assist physicians in their clinical decision-making and improve health outcomes for patients. Diagnostic and classification criteria should bebased on evidence from rigorously conducted controlled studies. Formal group consensus methodshave been developed to organize subjective judgments and to synthesize them with the availableevidence. This review discusses 4 types of formal consensus methods used in the health field andtheir applications in rheumatology: the Delphi method, Nominal Group Technique, RAND/UCLA Appropriateness Method, and National Institutes of Health consensus developmentconference.© 2011 Elsevier Inc. All rights reserved. Semin Arthritis Rheum 41:95-105Keywords: guideline criteria, formal consensus methods, Delphi method, nominal group technique,RAND/UCLA Appropriateness Method, National Institutes of Health consensus development, confer-

ence, GRADE, diagnostic criteria

Faatotocoiti

ovtltwd

D

Ciris

Physicians are faced with diagnostic and treatmentdilemmas on a daily basis in their clinic setting.Well-developed diagnostic or treatment guide-

ines in clinical practice summarize data for physicianso improve their clinical decision-making for an indi-idual patient (1). Evidence-based guidelines derivedrom rigorously conducted controlled studies outper-orm expert opinion; however, in practice there arenstances where sufficient research-based evidence doesot exist (2). Various group consensus methods incor-orate opinions of a group of experts rather than anndividual expert in a formalized manner (3,4). Givenhe diversity of opinion that can occur with the diag-osis and management of disease, formal group con-ensus methods organize subjective judgments androvide guidance when there is no existing evidence.

*Department of Medicine, Washington Hospital Center, Washington, DC.†Department of Medicine, University of Pittsburgh, Pittsburgh, PA.‡Department of Medicine, David Geffen School of Medicine at UCLA, Los Angeles,CA.

1Both authors contributed equally to the manuscript.Funding Source: No funding was obtained for the study design, collection, analysis,

nd interpretation of data; in the writing of the manuscript; and in the decision toubmit the manuscript for publication. Dr. Khanna was supported by a Nationalnstitutes of Health Award (NIAMS K23 AR053858-04).

None of the authors have competing interests related to the content of thisanuscript.Address reprint requests to: Dinesh Khanna, MD, MSc, Division of Rheumatology,

epartment of Medicine, David Geffen School of Medicine, 1000 Veteran Avenue,

lm 32-59, Rehabilitation Building, Los Angeles, CA 90095. E-mail: [email protected].

0049-0172/11/$-see front matter © 2011 Elsevier Inc. All rights reserved.doi:10.1016/j.semarthrit.2010.12.001

ormal group consensus methods allow for inclusion ofwide range of knowledge and experience, and inter-

ction between members, while stimulating construc-ive debate and preventing influential behavior of 1pinion to formulate suggestions about a specific ques-ion. Since perfect agreement is seldom reached, thebjective of the consensus methodology is to identify aentral tendency among the group and grade the levelf agreement reached. Consensus methods are increas-ngly being used to develop diagnostic, classification,herapeutic guidelines and response criteria sets in var-ous fields of medicine including rheumatology (5,6).

Our review focuses on details of the consensus meth-dologies along with the relative advantages and disad-antages of each method. There are other formal quan-itative methods to combine information from theiterature such as meta-analysis and systematic review andhe purpose of this review is to concentrate on consensushen other quantitative methods are not possible eitherue to paucity or conflict in the literature.

EFINING CONSENSUS

onsensus does not require full agreement among partic-pants (7). A prespecified range is decided by the groupunning or leading the consensus methodology, accord-ng to the needs of the task to be performed. There areeveral methods for defining consensus with examples be-

ow (8):

95

Page 2: Methods of Formal Consensus in Classification/Diagnostic Criteria and Guideline Development

P

Tisfieltqptnpuowghagibppwepmmto

tmr

fnip

IEW

Actpeddstefcisls

cbostwbs

F

TuT(spt

96 Formal consensus in classification/diagnostic criteria

● A final vote determining percentage agreement (eg,80%) among participants

● Rating scale—a specified mean rating must be achievedon each topic for inclusion by group

● A majority of participants must give a topic a certainrating for inclusion of a topic (as detailed later in theRAND-UCLA methodology)

ARTICIPANTS

he composition of the group is important in determin-ng the decision reached. Most agree that a participanthould be an expert with credibility in the appropriateeld (8,9). An expert can be a clinician with vast clinicalxpertise, a researcher who is well versed with the currentiterature, or a layperson or patient who has experiencedhe impact of the disease or intervention or condition inuestion (4,10). Ideal characteristics of a group partici-ant would be researchers and clinicians with expertise inhe field providing face validity and facilitating dissemi-ation of consensus meeting results (8). In addition, thearticipants should be enthusiastic about the project andnderstand the demands and responsibilities of it. Heter-geneity of the participants (different levels of expertiseithin a field, different fields of expertise, different geo-raphical areas and population ethnicity, or differentealth care policies in different countries) has advantagesnd disadvantages to the consensus process. A diverseroup may lead to better performance than homogeneityn terms of considering all relevant areas of the topic (11)ut may also lead to disagreement among group partici-ants due to different diagnostic and therapeutic ap-roaches based on geographic area. Also, a participantith higher authority may exert a greater degree of influ-

nce on the group when the consensus group compriseseople of diverse status, whereas consensus derived by aore uniform group will be more likely to reflect theajority view (12). In our view, heterogeneity of the par-

icipants brings more credibility to the consensus meth-dology and a more robust, creative product is developed.

In addition, larger groups will increase the reliability ofhe judgment (13), but may be difficult to coordinate andore time consuming. In general, reliability of consensus

ecommendations declines rapidly when the group size

Table 1 Characteristics of Informal and Formal Consensus

ConsensusDevelopment

Method

MailedQuestionnairesa/

Survey

Private DecisionsElicited Prior to

GroupDiscussion?

Formalof Grou

Delphi Yes Yes YNGT No Yes YRAND Yes Yes YConsensus

dev. Conf.No No N

aPostal survey, internet, fax.

alls below 6, and, above 12, improvement in reliability isot substantial (13). An exception is the Delphi and nom-

nal group techniques that can recruit a larger number ofarticipants.

MPORTANCE OFVIDENCE FROM LITERATUREHEN USING GROUP CONSENSUS

lthough group consensus participants are generally re-ruited on the basis that they have superior knowledge ofhe published literature in the field, it is essential to sup-lement participants with up-to-date, evidence-based lit-rature. When literature reviews are provided, the evi-ence is used both in the initial discussion and in laterecision-making (14). A literature review (preferably aystematic review) provides a common starting point forhe group and increases the perception that the process isvidence-based (10). The amount of information variesrom no information because available evidence is weak toomprehensive synthesis of relevant research when theres abundant information in literature but no general con-ensus (8). Any level of information, however small, isikely to influence the group decision (12) and thushould be provided.

In the development of clinical guidelines for care, in-orporating an evidence base has been shown to result inetter guidelines. In a retrospective study comparing setsf guidelines, 6 sets of breast cancer guidelines were clas-ified as evidence-based, consensus-based, or a combina-ion of both. Instruments to measure quality of guidelinesere used to grade the sets of guidelines. The evidence-ased guidelines scored highest on quality, whereastrictly consensus-based guidelines scored the lowest (15).

ORMAL CONSENSUS METHODS

he focus will be on 4 types of formal consensus methodssed in the health field: Delphi method, Nominal Groupechnique, RAND/UCLA Appropriateness Method

RAM), and National Institutes of Health (NIH) consen-us development conference methodology (Table 1). Inractice, a combination of 2 formal consensus methods orheir modifications can be used in a 2-step process, where

pment Methods modified from (20)

ackice

Face-to-FaceContact

InteractionStructured

IncorporatesEvidence

AggregationMethod

No Yes � ExplicitYes Yes � ExplicitYes Yes ��� ExplicitYes No � Implicit

Develo

Feedbp Cho

eseseso

Page 3: Methods of Formal Consensus in Classification/Diagnostic Criteria and Guideline Development

R. Nair, R. Aggarwal, and D. Khanna 97

1 method is used for item generation or some initial con-sensus and then the other method is used for final consen-sus. When there has not been rigid adherence to 1 con-sensus methodology, the description of the procedure inthe Methods section is referred to as a “modified” consen-sus method.

Delphi Methods or Technique

The Delphi method originated in the 1960s and takes itsname from the Delphic oracle’s skills of interpretationand foresight (16). It was developed at the RAND Cor-poration to obtain the most reliable consensus of opinionof a group of experts on a subject in a systematic manner(17,18). It attempts to do this by a series of well-definedquestionnaires based on surveys and feedback. UsuallyDelphi is applied when a consensus among a large num-ber of participants is needed who cannot meet simultane-ously for logistic or economic reasons (19).

Process

The first step is to define a specific problem or question tobe solved by the Delphi process (Fig. 1) (20). The purposefor the Delphi should be made clear. For example, in thecase of criteria set development, it is important to make itexplicit that the goal of the Delphi process is to developdiagnostic/classification criteria set for the specific clinicalentity.

Before Round 1

The team (or Steering Committee) undertaking the Del-phi invites clinical or research experts to provide opinionson a specific matter and participate in subsequent ques-tionnaire rounds. The panel of participants must be ade-

Define a question

Invite panel of experts

•Equ•Q

First round of Delphi

Second round of Delphi

•Edis•D•Rin

•M

Third round of Delphi • Ein

Results analyzed

Consensus reached ?

Yes, report results

Lit

Figure 1 Schematic diagram of Delphi process step by step. (Colo

quate (can be hundreds, but at least 10-30) and shouldrepresent multidisciplinary leaders from various geo-graphic areas on the particular subject in consideration.Participants should be motivated and interested in theproblem to be solved, which will help ensure adequateresponse rates to the survey.

Sometimes a thorough literature search is performed bythe team undertaking the Delphi process to study theexisting evidence, which may help to define the first set ofquestions. The results of the literature search and the rel-evant original publications are sent to the participant tofacilitate the process. Given the privacy issues, it has nowbecoming standard first to send an invitation letter to thegroup for participation in the Delphi process. This initialmail helps establish relationship, verify e-mail address,and provide the denominator that will be used to calculatethe response rate. For example, in an ongoing effort todevelop outcome measures for connective tissue disease–interstitial lung disease clinical trials, appropriate expertpanelists were identified via an international investigationof specialist associations, study investigators of multi-center endpoint studies, members of consensus commit-tees and task forces, members of disease specific groupsand authors of key publications, in journals and text-books. Patients with connective tissue disease-interstitiallung disease are also included in this effort.

Round 1

The participants are asked broad open-ended questionson the Delphi topic to elicit different opinions. Thesequestions are usually more descriptive to elicit the expert’sopinions, novel ideas, and their experience on a specificsubheading/question. The opinions gathered are groupedtogether under a limited number of headings and state-

opinions are categorized under in a aire formnaire are mailed out to all experts

score their agreement and ent in a quantitative mannerf agreement is analyzeduestionnaire is generatedting 1st round response

ack for another round of response

rescore agreement or disagreement t of group’s responses

No

review & data presented to experts

Repeat 3rd round

xperts estionnuestion

xperts agreemegree oepeat qcorporaailed b

xpertsthe ligh

erature

r version of figure is available online.)

Page 4: Methods of Formal Consensus in Classification/Diagnostic Criteria and Guideline Development

98 Formal consensus in classification/diagnostic criteria

ments drafted for circulation to all participants in ques-tionnaire form. This initial survey should be as simple aspossible and is ideally no longer than a page. Researcherscan send detailed instructions of the process and ask theparticipants to provide group and items relevant to theproject. Alternately, they can propose certain groups rel-evant to the study and have the participants propose ad-ditional groups and items (eg, organ systems and patientreported outcome measures as done recently in sclero-derma [21]).

Round 2

The replies from the participants in round 1 are analyzedwith simple descriptive statistics and summarized to gen-erate a series of statements. At this point there may beconsensus on some issues. Those issues not reaching con-sensus are developed into a focused second questionnaireand sent back to the individual participants. Feedbackfrom round 1 is also reported to the participants. Depend-ing on the question, the participants either answer fo-cused questions in a yes/no manner to reach immediateconsensus or rank their agreement or disagreement witheach statement (or items) in the second round. The ques-tions should be clearly and concisely stated and responseselicited should be in a simple and straightforward fashion.

Round 3

In the second survey, the panelists are provided with theoverall results and his/her previous reply/scores. Expertsthen reconsider their previous opinion and re-rate theirlevel of agreement with each statement in the question-naire, with the opportunity to change their score in viewof the group’s response. The re-ratings are summarizedand assessed for degree of consensus: if an acceptabledegree of predefined consensus (which should be pre-defined before initiation of the Delphi) or when suffi-

Advantages and Disadvantages of Delphi Technique

Advantages● Large number of participants possible (23)● Each participant expresses their opinion freely and

impersonally● Limits dominance by eminent, eloquent, or highly

opinionated individuals in the field● Less likely that the moderator of the panel may

bias the group● Substantial amount of time to express ideas,

reflect on answers, and make changes

● Cheap, convenient, and no geographicalconstraints

● Easy to understand, flexible, and can be appliedto broad range of topics

● Can be used preceding NGT meeting for initialitem generation

cient information exchange has been obtained is ob-

tained, the process may cease, with final results fedback to participants; if not, the third round is repeated.Ideally consensus should be reached by round 3 but intheory many more rounds can be done. Researchershave used a predefined cut of the mean/median values(21) or done cluster analysis to discriminate importantfrom unimportant items (22).

In summary, the process is conducted usually over 2 to4 “rounds,” with the results elicited, tabulated, and re-ported to the group after each round (Box 1).

Examples in Rheumatology

Delphi strategies have been used to solve an array of prob-lems in health and medicine, from the needs of an indi-vidual hospital or department to those of a statewideagency or state (23,24). Delphi technique has been widelyused in the past in the internal medicine and rheumatol-ogy literature to develop diagnosis and managementguidelines. Specifically, in the rheumatology literature,Delphi has been used for criteria set development, espe-cially for reaching consensus on the outcome and re-sponse criteria set in diseases like systemic sclerosis, sys-temic lupus erythematosus, gout, ankylosing spondylitis,and psoriatic arthritis (25-27). Use of Delphi exercise inclassification criteria set development has been limitedand is mostly used in the initial phase to gather data andopinion in an open-ended and systematic manner or tomake initial consensus on broad categories of variablesthat are further narrowed down by other consensus meth-ods like Nominal Group Technique (NGT) in the secondpart of the consensus development process (5,6,28-30).

Recommendations for Use

Delphi exercise is best used for reaching consensus on

Disadvantages● Generalizability of the study findings (external validity)● Dependent on questionnaire design● Vulnerability with respect to who is an “expert”● Obliviousness to reliability measurement and scientific

validation of findings (24)● Potential for bias exists in participant selection

● Consensus panel judgments influenced by panelcomposition and by feedback given during the panelprocess (25)

● Coordinating large groups and several rounds can becomplicated and costly

● Delphi does not allow any personal contact betweenthe experts

variables to be used in outcome or response criteria set.

Page 5: Methods of Formal Consensus in Classification/Diagnostic Criteria and Guideline Development

R. Nair, R. Aggarwal, and D. Khanna 99

Nominal Group Technique

NGT is derived from social-psychological studies of deci-sion conferences and management science studies. It is aface-to-face structured group meeting of experts that is ledby an experienced moderator (8). NGT was historicallyused in the social services, government, education, andindustry since the 1960s.

In NGT, a question is generated for which an expertpanel is convened. The expert panel may consist of 5 to9 panelists or could be larger. Larger groups can beseparated into several groups (or tables) of 5 to 9 par-ticipants that work simultaneously on the same ques-tions (31).

In brief, NGT asks for silent and independent genera-tion of ideas, the round-robin listing the ideas, serial dis-cussion led by a moderator, and independent ranking ofthe ideas.

NGT is preferably done in a room with a rectangulartable arranged in an open “U” with a flip chart or largescreen computer at the open end of the table. Each pan-elist is asked to generate ideas to specific questions andrecord privately on a piece of paper for 5 to 10 minutes. Inthe second phase, in round-robin fashion, participants areasked to provide each of their ideas, which are listed on aflip chart or wide-screened computer visible to the entiregroup. No discussion is conducted at this time. In thethird phase, a serial brief discussion is led by the modera-tor; the goal of the discussion is clarification of the ideas orstatements (31). During the last phase, each idea is pri-vately ranked or rated on a scale of 1 to 5 or 1 to 10. Thehighest ranking solutions are kept while the remainingsolutions are discarded. Since the NGT can be influencedby strong personalities, the process requires that at eachround the first person to speak is different from the pre-vious one; this means that the first round will start withthe first person to the left of the moderator, the secondround with the second person to the left of the moderator,etc. In this way all people will have the possibility to speakfirst and avoid the undue influence of strong personalities(19). There are no guidelines regarding the cutoff forwhat constitutes a high versus low ranking. Generally 70to 80% consensus is required (32), although researchersmay chose to have a lower cutoff, but this should be pre-

Advantages and Disadvantages of Nominal Group Techniq

Advantages● Participants meet face-to-face

● All participants have an opportunity to voiceopinions

● Personal contact between experts

● Design of NGT does not allow any individual todominate

● Group voting can occur if desired in later rounds

defined (21).

It is important to remember that the purpose of NGTis to establish a prioritization of ideas and issues, and theuse of numerical voting can assist in this (33). However,the temptation to attach greater meaning to the numbersand carry out more sophisticated quantitative analysisshould be resisted.

There are several variations of this process. Meetingscan occur in several rounds with initial responses collectedvia mail (Delphi technique) followed by a future face-to-face-meeting. After a discussion period, additional roundsof generating ideas or re-ranking responses can occur. Asused with the RAND/UCLA method (detailed later),synthesized evidence can be provided to the expert panelbefore submitting solutions for the question that has beengenerated (Box 2).

Examples in Rheumatology

While often used in combination with Delphi methodol-ogy, there are some classification criteria sets developedwith NGT for classification of childhood vasculitis andjuvenile systemic sclerosis (5,6). Other pediatric rheuma-tologic conditions for which criteria have been developedalso include juvenile SLE (34,35), juvenile idiopathic ar-thritis (36,37), and inflammatory myositis (29,38,39). Acombination of Delphi and NGT has also been used inthe development of provisional core set items (21). Otherexamples of NGT used in rheumatology include develop-ing criteria for clinical outcomes in rheumatoid arthritiswith the Outcome Measures in Rheumatoid ArthritisClinical Trials committee (40,41) as well as developmentof the British Isles Lupus Assessment Group’s disease ac-tivity index (42).

Recommendations for Use

The primary use of NGT is to give priority to questions/items to be discussed. It can be adapted to be used forother circumstances such as:

1. Developing classification and response criteria.2. Developing consensus practice guidelines.3. Developing ideas for exploratory research (43).

This method is often used in combination with the

Disadvantages● Certain members of the panel can take over discussion

and drive results—experienced moderator required● Limited by time—only a few questions can be

discussed and agreed on● Economic and time costs associated with face-to-face

meeting● Limited to providing a solution to a few problems

limits its applicability to multiple scenarios

ue

Delphi method as described above. Pure NGT is useful

Page 6: Methods of Formal Consensus in Classification/Diagnostic Criteria and Guideline Development

100 Formal consensus in classification/diagnostic criteria

when solutions need to be developed at a face-to-facemeeting and expert opinion weighs in more than evidencewhich is lacking for the particular topic.

RAND-UCLA Appropriateness Method

RAM was a method of group consensus developed in the1980s by RAND (research and development) Corpora-tion and UCLA (University of California-Los Angeles)(44). Using current scientific evidence in conjunctionwith expert opinion, this consensus method was initiallydeveloped to evaluate the overuse/underuse of medical orsurgical procedures. This procedure has been used in theUnited States for assessing appropriateness of proceduressuch as coronary angiography, carotid endarterectomy,hysterectomy, and upper gastrointestinal tract endoscopy(45-47).

This process usually involves 2 interdependent groups:a core panel and an expert panel. The core panel guidesthe expert panel through the tasks of the RAM and pro-vides synthesized data to the expert panel and the expertpanel uses the data provided by the core panel to come toa consensus.

Before the consensus process, the core panel will con-duct a systematic literature review with evidence synthesisto provide the expert panel with all pertinent informationthat will guide evidence-based decision-making (48).Then, a list of clinical scenarios (indications) is developedto provide the expert panel. These clinical scenarios willdescribe a patient with a set of characteristic features andthe expert panel will be given a Likert scale (typically9-point) that will ask whether a particular intervention isappropriate for the patient.

There are 2 rounds of rating appropriateness of anintervention. In the first round, the expert panel re-ceives the clinical scenarios by mail (or via the internetas discussed in the Delphi section) and are asked to ratethe appropriateness (on a 1 to 9 Likert scale) of anintervention. In RAM, a rating between 1 and 3 isconsidered “inappropriate” (risks outweigh benefits), 7to 9 is considered “appropriate” (benefits outweighrisks), and 4 to 6 is “uncertain.” These decisions aremade without the consideration of the cost. They areasked to do this independent of the other panelists.They are allowed to use the synthesized evidence pro-vided by the core panel overseeing the consensus pro-cess. The panel is asked not to consider the cost ofperforming the procedures or intervention in the ratingof their scenarios (44). There are other definitions pro-posed by RAM but this works best for a panel of 7 to 15members and is most widely used.

The second round is conducted as a face-to-facemeeting over 1 to 2 days and is led by an experiencedmoderator. The expert panel was historically restrictedto 9 panelists but can range from 7 to 15 panelists (44).In general, an odd number is preferred to prevent a

“tie.” It is recommended that a multidisciplinary panel

is selected so that bias from a like-minded group withidentical agenda is avoided (49). All panel experts aregiven results of other experts’ individual ratings for allclinical scenarios. Panel experts are given an opportu-nity to discuss individual views on the appropriatenessof the intervention in each clinical scenario. At the endof the discussion each panelist can reconsider theiroriginal rating and re-rate the clinical scenario. Theresults are then summarized using descriptive statistics.Disagreement is when one third or more panelists ratescenario in lowest the 3 points and one third or morerate the same scenario in the highest 3 points of appro-priateness scale. In the absence of disagreement themedian rating in the lower 3 points is “inappropriate”;the median rating in the upper 3 points is “appropri-ate,” and the rating in the 4 to 6 range or those withdisagreement is “uncertain.” These ratings can finallybe used to determine whether an intervention has beenused inappropriately retrospectively or determinewhether an intervention should be performed prospec-tively. Depending on the number of participants in-cluded, there can be some variation in the definitions ofagreement.

Based on the ratings of the expert panel, an indicationin each clinical scenario will be considered appropriate,uncertain, or inappropriate; however, it lacks a clear rat-ing of the evidence or a ranking of the scientific robustnessof the recommendations.

While RAM was initially used for assessing appropri-ateness of a procedure, it has been applied to the develop-ment of practice guidelines and classification criteria.Once indications have been ranked by appropriatenessusing a Likert scale as described above, summary state-ments reflecting the output from the consensus exerciseare used to make a set of classification criteria or practiceguidelines (50). In making summary statements, both theperceived appropriateness of a certain intervention orguideline as well as the agreement among participants arecritical (Box 3).

Examples in Rheumatology

This methodology has been used for the development ofquality indicators in rheumatology (51,52). Recommen-dations have been developed using this methodology in-cluding recommendations for clinical agent evaluation fortreatment in osteoporosis, which was endorsed by theAmerican Society for Bone and Mineral Research, theInternational Society for Clinical Densitometry, andthe National Osteoporosis Foundation (53). Recently,quality indicators for systemic lupus erythematosus (54)have been published as well as recommendations for eval-uating and treating rheumatoid arthritis (10,55).

Recommendations for Use

Any project that needs a formal method for combining

scientific evidence from the literature and expert consen-
Page 7: Methods of Formal Consensus in Classification/Diagnostic Criteria and Guideline Development

R. Nair, R. Aggarwal, and D. Khanna 101

sus might consider RAM. Specifically, it can be used inthe following:

1. Developing clinical practice guidelines where evidenceis not sufficient.

2. Developing quality of care indicators.3. Determining appropriate use criteria for a medical in-

tervention.4. Determining overuse or underuse of a procedure.

This rigorous method can be used when resources andtime are available and can be used in any circumstancewhere a thorough combination of scientific evidence andexpert opinion is required.

Consensus Development Conference

In 1997, the NIH in the United States introduced theconsensus development conference (CDC) (8). TheCDC brings together selected experts in the field andconcerned individuals (from expert to public) to reachgeneral agreement about the safety, efficacy, and ap-propriateness of using various medical procedures,drugs, and devices. The Office for Medical Applica-tions of Research (OMAR) of the NIH is charged withresponsibility of this conference (56). OMAR’s pri-mary goal is to help bring the results of biomedicalresearch into direct use in the practice of medicine. Tothis end, it acts to coordinate consensus developmentactivities at the NIH for evaluating current evidenceand promulgating consensus of opinions on a particu-lar topic. Each conference is jointly sponsored and ad-ministered by 1 or more Institutes or Centers (ICs) ofNIH and by the OMAR. They have run more than 100conferences on a variety of topics (57). The develop-ment of consensus conferences is a hybrid of methodsused by judicial decision-making, scientific confer-

Advantages and Disadvantages of RAND-UCLA Appropriate

Advantages● Synthesis of published literature prior to consensus

techniques incorporated● Allows for both confidential ratings as well as

group discussion● Multidisciplinary panel encourage consensus from

a wider group

● Reproducibility of RAM ranges from moderate toexcellent as determined by different panelists for“appropriate” and “inappropriate” care (46)

● Acceptable predictive validity for arecommendation supported by RCTs (47)

ences, and the town hall meeting (58).

Process

A topic for the CDC, usually of public health impor-tance, is decided by agreement between the sponsoringIC and OMAR. A set of predetermined questions onwhich a consensus is warranted is selected from thetopic by experts appointed by sponsoring IC andOMAR, which defines the scope of the conference. Adifferent selected group of experts (around 10 experts)is invited and brought together to form an expert panelor decision-making group for the conference. Thepanel is an independent, broad-based, non-Depart-ment of Health and Human Services, nonadvocacygroup with appropriate expertise. The panel membersare highly regarded in their own fields but are notclosely aligned with the subject. The evidence is thenpresented to the conference by various experts ap-pointed by sponsoring IC and OMAR, who are not amember of the decision-making group. Usually, theAgency for Health Care Research and Quality providesa systematic review of literature on the conference topicthrough 1 of its Evidence-Based Practice Centers. Thegroup or expert panel hears the scientific data presentedby invited experts as well as the comments from thegeneral public in a public (open) session followed bydiscussion. The format allows the participation of theaudience (members of the public) in an open meeting,who can ask questions to the group. The panel thenmeets in a private (executive) session to further delib-erate on the evidence and discussion to reach a consen-sus. The chairperson is the moderator responsible forguiding and controlling the proceedings of both theopen part of the conference and the private group dis-cussion as well as for helping reaching consensus on thetopic/question where 2 or more experts differ in theiropinion. The panel weighs the information and reachesa consensus statement that addresses a set of predeter-mined questions. The consensus statement draft is then

ethod

Disadvantages● Misclassification is expected (49)

● Takes great deal of time from gathering of theevidence to multiple rounds of consensus

● Face-to-face, which can add cost/time delayand lead to highly opinionated individuals inthe field dominating the discussion

● Requires third party (core panel) to constructclinical indications for an intervention andanalyze/interpret the results from the expertpanel meeting

● 9-point Likert scale can be cumbersome

● Requires voting on multiple case scenarios(sometimes �1000)

ness M

presented in a plenary session and is subject to review

Page 8: Methods of Formal Consensus in Classification/Diagnostic Criteria and Guideline Development

i

R1

2

etlm

U

Uttm6Ct

102 Formal consensus in classification/diagnostic criteria

and comment by conference attendees. Following thediscussion, the panel may then modify the statement ifappropriate in the final executive session. The finalconsensus statement is then released and disseminatedwidely to achieve maximum impact on health carepractice and medical research. This is an independentreport and is not considered a policy of NIH or thefederal government.

A major contribution of the NIH consensus panels hasbeen to describe current levels of agreement on importanttopics like coronary artery bypass surgery, intraocular lensimplantation, cesarean section, Reye’ syndrome, and thetreatment of breast cancer (Box 4).

Examples in Rheumatology

Use of CDC in the field of rheumatology has been limitedto systemic review of diagnosis and therapeutic options onthe topics like osteoporosis, total knee and hip replace-ment, as well as recommendations for intravenous immu-noglobulin therapy (59-62). There have been no criteria setdeveloped by NIH CDC meeting in any field of medicinencluding rheumatology.

ecommendations for Use. CDC conferences are particularly useful for provid-

ing guidance when a controversy exists over preven-tive, therapeutic, or diagnostic options for publicpolicy

. When the issue is of high degree of public interest orhas impact on health care cost.

It is intended as a statement that reflects the views ofxpert panel that have examined and discussed the scien-ific data, rather than a legal document or a practice guide-ine. The statement may reflect uncertainties, options, or

inority viewpoints.

se in Rheumatology

se of CDC in the field of rheumatology has been limitedo systemic review of diagnosis and therapeutic options onhe topics like osteoporosis, total knee and hip replace-ent, as well as recommendations for IVIG therapy (59-

2). There have been no criteria set developed by NIHDC meeting in any field of medicine including rheuma-

Advantages and Disadvantages of NIH Consensus Develop

Advantages● Mix of practicing physicians, researchers, consumers, an

others to come together and jointly evaluate an existingtechnology

● Wide circulation through both lay and medical media

● Unbiased panel

ology.

GRADING OF RECOMMENDATIONSASSESSMENT, DEVELOPMENT, ANDEVALUATION (GRADE)

This review also discusses GRADE as it has been adoptedby scientific world for the development of recommenda-tions. GRADE is not a consensus methodology per se butuses consensus methods discussed above (especially Del-phi and NGT) to assess quality and strength of a recom-mendation. GRADE methodology provides a systematicand transparent approach to rate the quality of evidenceand grade the strength of recommendations for patientimportant outcomes. GRADE was developed by expertswith a goal to answer some drawbacks of previous meth-ods, which include the lack of separation between qualityof evidence and strength of recommendation and the lackof explicit acknowledgment of values and preferences(63). Several organizations and guideline developers, in-cluding the World Health Organization, the AmericanCollege of Chest Physicians, the American Endocrine So-ciety, and UpToDate, have adopted the GRADE system.GRADE Quality of Evidence is graded into 1 of 4levels— high, moderate, low, and very low. RCTs usu-ally get a higher grading compared with observationalstudies. The system allows the quality of evidence de-rived from observational data to be upgraded from lowto moderate or high categories and the quality of evi-dence coming from randomized trials to be down-graded depending on the design and execution of suchstudies. The strength of recommendation is dividedinto strong (benefits far outweigh risks or that risks faroutweigh benefits and virtually all patients will makesame decisions) and weak (whereas tradeoffs betweenrisks and benefits are less clear and patient’s values andpreferences are incorporated before making a treat-ment plan). Factors influencing strength of recommen-dations may include quality of evidence, balance be-tween desirable and undesirable effects, individual’svalues and preferences for a treatment, and resourceutilization (63,64).

Examples in Rheumatology

One example in rheumatology is the American College ofRheumatology guidelines in osteoarthritis (unpublished).Other examples in the literature include use in reaching

Conference

Disadvantages● Interaction is not structured

● The aggregation methodology used is implicit—aformal feedback system is lacking

● CDC has not been used for making new criteria sets

ment

d

decisions on practice guidelines (65) and grading recom-

Page 9: Methods of Formal Consensus in Classification/Diagnostic Criteria and Guideline Development

1

1

1

1

11

11

1

1

2

2

2

2

2

2

2

R. Nair, R. Aggarwal, and D. Khanna 103

mendations for the use of thrombolytic agents and pre-vention of coronary artery disease (64,66).

Recommendations for Use1. To develop clinical guidelines.

LIMITATIONS OF REVIEW

Although we present a comprehensive review of formalconsensus methods, we did not conduct a systematic re-view. In our comprehensive review, the majority of thearticles on formal consensus methodologies are imple-mentations in diagnostic/classification criteria, guide-lines, and response indices development.

CONCLUSION

Formal consensus methodologies have various applica-tions in the medical literature from guideline develop-ment to development of criteria sets. These methods arebeing increasingly used in rheumatology. Each method-ology has its unique attributes, and utilization of a partic-ular methodology or combination of methodologiesdepends on the following: (1) clinical question; (2) au-dience; and (3) available resources.

When closely tied to evidence-based literature, thesesummarized statements can provide physicians with guid-ance in classifying and treating disease. Dangers withgroup consensus occur when there is no rigor in the meth-ods used and when the ensuing recommendations arenonspecific (67). Also, consensus methodologies provideareas/strength of agreement (and disagreement) andshould not replace evidence-based literature.

ACKNOWLEDGMENTS

We would like to acknowledge Drs. Carol Wallace andGillian Hawker for providing constructive feedback dur-ing the writing of this manuscript.

REFERENCES

1. Lugtenberg M, Burgers JS, Westert GP. Effects of evidence-basedclinical practice guidelines on quality of care: A systematic review.Br Med J 385;18(5):385.

2. Chassin M. How do we decide whether an investigation or pro-cedure is appropriate. Appropriate investigation and treatment inclinical practice. London: Royal College of Physicians; 1989. p.21-9.

3. Agency for Healthcare Policy and Research. AHCPR clinicalpractice guideline program. Report to Congress. 1995.

4. Black N, Murphy M, Lamping D, McKee M, Sanderson C,Askham J, et al. Consensus development methods: A review ofbest practice in creating clinical guidelines. J Health Serv ResPolicy 1999;4(4):236-48.

5. Ozen S, Ruperto N, Dillon MJ, Bagga A, Barron K, Davin JC, etal. EULAR/PReS endorsed consensus criteria for the classificationof childhood vasculitides. Ann Rheum Dis 2006;65(7):936-41.

6. Zulian F, Woo P, Athreya BH, Laxer RM, Medsger TA Jr, Leh-man TJ, et al. The pediatric rheumatology European Society/

American College of Rheumatology/European League Against

Rheumatism provisional classification criteria for juvenilesystemic sclerosis. Arthritis Rheum 2007;57(2):203-12.

7. Linstone HA, Turoff M. The Delphi method: Techniques andapplications. Reading (MA): Addison-Wesley Pub. Co., Ad-vanced Book Program; 1975.

8. Fink A, Kosecoff J, Chassin M, Brook RH. Consensus methods:Characteristics and guidelines for use. Am J Public Health 1984;74(9):979-83.

9. Jones J, Hunter D. Consensus methods for medical and healthservices research. BMJ 1995;311(7001):376-80.

0. Saag KG, Teng GG, Patkar NM, Anuntiyo J, Finney C, CurtisJR, et al. American College of Rheumatology 2008 recommenda-tions for the use of nonbiologic and biologic disease-modifyingantirheumatic drugs in rheumatoid arthritis. Arthritis Rheum2008;59(6):762-84.

1. Jackson S. Team composition in organizational settings: Issues inmanaging an increasingly diverse work force. In: Worchel S,Wood W, Simpson J, editors. Group Process and Productivity.Newbury Park (CA): Sage; 1992. p. 138-73.

2. Vinokur A, Burnstein E, Sechrest L, Wortman PM. Group deci-sion making by experts: Field study of panels evaluating medicaltechnologies. J Pers Soc Psychol 1985;49(1):70-84.

3. Kahan JP, Park RE, Leape LL, Bernstein SJ, Hilborne LH, ParkerL, et al. Variations by specialty in physician ratings of the appro-priateness and necessity of indications for procedures. Med Care1996;34(6):512-23.

4. Jacoby I. Evidence and consensus. JAMA 1988;259(20):3039.5. Cruse H, Winiarek M, Marshburn J, Clark O, Djulbegovic B.

Quality and methods of developing practice guidelines. BMCHealth Serv Res 2002;2(1):1.

6. Dalkey NC. Delphi. Santa Monica (CA): Rand; 1967.7. Helmer-Hirschberg O. The use of the Delphi technique in prob-

lems of educational innovations. Santa Monica (CA): Rand;1966.

8. Dalkey NC, Helmer-Hirschberg O. An experimental applicationof the Delphi method to the use of experts. Santa Monica (CA):Rand; 1962.

9. Delbecq AL, Van de Ven AH, Gustafson DH. Group techniquesfor program planning: A guide to nominal group and delphi pro-cesses. Middleton (WI): Green Briar Press; 1975.

0. Murphy MK, Black NA, Lamping DL, McKee CM, SandersonCF, Askham J, et al. Consensus development methods, and theiruse in clinical guideline development. Health Technol Assess1998;2(3):i, iv, 1-88.

1. Khanna D, Lovell DJ, Giannini E, Clements PJ, Merkel PA,Seibold JR, et al. Development of a provisional core set of re-sponse measures for clinical trials of systemic sclerosis. AnnRheum Dis 2008;67(5):703-9.

2. Distler O, Behrens F, Pittrow D, Huscher D, Denton CP, Foeld-vari I, et al. Defining appropriate outcome measures in pulmonaryarterial hypertension related to systemic sclerosis: A Delphi con-sensus study with cluster analysis. Arthritis Rheum 2008;59(6):867-75.

3. Loughlin KG, Moore LF. Using Delphi to achieve congruentobjectives and activities in a pediatrics department. J Med Educ1979;54(2):101-6.

4. Thomson WA, Ponder LD. Use of delphi methodology to gener-ate a survey instrument to identify priorities for state allied healthassociations. Allied Health Behav Sci 1979;2(4):383-99.

5. Bertsias G, Ioannidis JP, Boletis J, Bombardieri S, Cervera R,Dostal C, et al. EULAR recommendations for the management ofsystemic lupus erythematosus. Report of a task force of theEULAR standing committee for international clinical studies in-cluding therapeutics. Ann Rheum Dis 2008;67(2):195-205.

6. Gazi H, Pope JE, Clements P, Medsger TA, Martin RW, MerkelPA, et al. Outcome measurements in scleroderma: Results from a

Delphi exercise. J Rheumatol 2007;34(3):501-9.
Page 10: Methods of Formal Consensus in Classification/Diagnostic Criteria and Guideline Development

4

4

4

4

4

4

4

5

5

5

5

5

5

5

5

5

5

6

6

6

6

104 Formal consensus in classification/diagnostic criteria

27. Dernis E, Lavie F, Pavy S, Wendling D, Flipo RM, Saraux A, et al.Clinical and laboratory follow-up for treating and monitoringpatients with ankylosing spondylitis: Development of re-commendations for clinical practice based on published evidenceand expert opinion. Joint Bone Spine 2007;74(4):330-7.

28. Wallace CA, Ruperto N, Giannini E, Childhood Arthritis andRheumatology Research Alliance, Pediatric Rheumatology Inter-national Trials Organization, Pediatric Rheumatology Collabor-ative Study Group. Preliminary criteria for clinical remission forselect categories of juvenile idiopathic arthritis. J Rheumatol2004;31(11):2290-4.

29. Rider LG, Giannini EH, Brunner HI, Ruperto N, James-NewtonL, Reed AM, et al. International consensus on preliminary defini-tions of improvement in adult and juvenile myositis. ArthritisRheum 2004;50(7):2281-90.

30. Oddis CV, Rider LG, Reed AM, Ruperto N, Brunner HI, KoneruB, et al. International consensus guidelines for trials of therapies inthe idiopathic inflammatory myopathies. Arthritis Rheum 2005;52(9):2607-15.

31. Ruperto N, Meiorin S, Iusan SM, Ravelli A, Pistorio A, MartiniA. Consensus procedures and their role in pediatric rheumatol-ogy. Curr Rheumatol Rep 2008;10(2):142-6.

32. Giannini EH, Ruperto N, Ravelli A, Lovell DJ, Felson DT, Mar-tini A. Preliminary definition of improvement in juvenile arthritis.Arthritis Rheum 1997;40(7):1202-9.

33. Van de Ven AH, Delbecq AL. The nominal group as a researchinstrument for exploratory health studies. Am J Public Health1972;62(3):337-42.

34. Ruperto N, Ravelli A, Cuttica R, Espada G, Ozen S, Porras O, etal. The pediatric rheumatology international trials organizationcriteria for the evaluation of response to therapy in juvenile sys-temic lupus erythematosus: Prospective validation of the diseaseactivity core set. Arthritis Rheum 2005;52(9):2854-64.

35. Ruperto N, Ravelli A, Oliveira S, Alessio M, Mihaylova D, PasicS, et al. The pediatric rheumatology international trials Organi-zation/American College of Rheumatology provisional criteria forthe evaluation of response to therapy in juvenile systemic lupuserythematosus: Prospective validation of the definition of im-provement. Arthritis Rheum 2006;55(3):355-63.

36. Wallace CA, Ravelli A, Huang B, Giannini EH. Preliminary val-idation of clinical remission criteria using the OMERACT filterfor select categories of juvenile idiopathic arthritis. J Rheumatol2006;33(4):789-95.

37. Wallace CA, Huang B, Bandeira M, Ravelli A, Giannini EH.Patterns of clinical remission in select categories of juvenile idio-pathic arthritis. Arthritis Rheum 2005;52(11):3554-62.

38. Rider LG, Giannini EH, Harris-Love M, Joe G, Isenberg D,Pilkington C, et al. Defining clinical improvement in adult andjuvenile myositis. J Rheumatol 2003;30(3):603-17.

39. Ruperto N, Ravelli A, Murray KJ, Lovell DJ, Andersson-Gare B,Feldman BM, et al. Preliminary core sets of measures for diseaseactivity and damage assessment in juvenile systemic lupus ery-thematosus and juvenile dermatomyositis. Rheumatology (Ox-ford) 2003;42(12):1452-9.

40. Goldsmith CH, Boers M, Bombardier C, Tugwell P. Criteria forclinically important changes in outcomes: Development, scoringand evaluation of rheumatoid arthritis patient and trial profiles.OMERACT committee. J Rheumatol 1993;20(3):561-5.

41. Tugwell P, Boers M. Developing consensus on preliminary coreefficacy endpoints for rheumatoid arthritis clinical trials. OMERACTcommittee. J Rheumatol 1993;20(3):555-6.

42. Isenberg DA, Rahman A, Allen E, Farewell V, Akil M, Bruce IN,et al. BILAG 2004. Development and initial validation of anupdated version of the british isles lupus assessment group’s dis-ease activity index for patients with systemic lupus erythematosus.

Rheumatology (Oxford) 2005;44(7):902-6.

3. Gallagher M, Hares T, Spencer J, Bradshaw C, Webb I. Thenominal group technique: A research tool for general practice?Fam Pract 1993;10(1):76-81.

4. Fitch K. The Rand UCLA appropriateness method user’s manual.Santa Monica (CA): Rand; 2001.

5. Chassin MR, Kosecoff J, Solomon DH, Brook RH. How coro-nary angiography is used. Clinical determinants of appropriate-ness. JAMA 1987;258(18):2543-7.

6. Kravitz RL, Park RE, Kahan JP. Measuring the clinical consis-tency of panelists’ appropriateness ratings: The case of coronaryartery bypass surgery. Health Policy 1997;42(2):135-43.

7. Shekelle PG, Park RE, Kahan JP, Leape LL, Kamberg CJ, Bern-stein SJ. Sensitivity and specificity of the RAND/UCLA appro-priateness method to identify the overuse and underuse of coro-nary revascularization and hysterectomy. J Clin Epidemiol 2001;54(10):1004-10.

8. Lopez-Olivo MA, Suarez-Almazor ME. Developing guidelines inmusculoskeletal disorders. Clin Exp Rheumatol 2007;25(6 Suppl47):28-36.

9. Fraser GM, Pilpel D, Kosecoff J, Brook RH. Effect of panel com-position on appropriateness ratings. Int J Qual Health Care 1994;6(3):251-5.

0. Shekelle PG, Schriger DL. Evaluating the use of the appropriate-ness method in the agency for health care policy and researchclinical practice guideline development process. Health Serv Res1996;31(4):453-68.

1. MacLean CH. Quality indicators for the management of osteoar-thritis in vulnerable elders. Ann Intern Med 2001;135(8 Pt 2):711-21.

2. Mikuls TR, MacLean CH, Olivieri J, Patino F, Allison JJ, FarrarJT, et al. Quality of care indicators for gout management. ArthritisRheum 2004;50(3):937-43.

3. Silverman SL, Cummings SR, Watts NB, Consensus Panel of theASBMR, ISCD, NOF. Recommendations for the clinical evalu-ation of agents for treatment of osteoporosis: Consensus of anexpert panel representing the American Society for Bone andMineral Research (ASBMR), the International Society for Clini-cal Densitometry (ISCD), and the National Osteoporosis Foun-dation (NOF). J Bone Miner Res 2008;23(1):159-65.

4. Yazdany J, Panopalis P, Gillis JZ, Schmajuk G, MacLean CH,Wofsy D, et al. A quality indicator set for systemic lupus erythem-atosus. Arthritis Rheum 2009;61(3):370-7.

5. Furst DE, Halbert RJ, Bingham CO 3rd, Fukudome S, AndersonL, Bonafede P, et al. Evaluating the adequacy of disease control inpatients with rheumatoid arthritis: A RAND appropriatenesspanel. Rheumatology (Oxford) 2008;47(2):194-9.

6. Perry S, Kalberer JT Jr. The NIH consensus-development pro-gram and the assessment of health-care technologies: The first twoyears. N Engl J Med 1980;303(3):169-72.

7. Ferguson JH. The NIH consensus development program. Theevolution of guidelines. Int J Technol Assess Health Care 1996;12(3):460-74.

8. Lomas J. Words without action? The production, dissemination,and impact of consensus recommendations. Annu Rev PublicHealth 1991;12:41-65.

9. NIH consensus statement on total knee replacement. NIH Con-sens State Sci Statements. 2003;20(1):1-34.

0. Osteoporosis prevention, diagnosis, and therapy. NIH ConsensStatement. 2000;17(1):1-45.

1. Total hip replacement. NIH Consens Statement. 1994;12(5):1-31.

2. Intravenous immunoglobulin. Consens Statement. 1990;8(5):1-23.

3. Guyatt GH, Oxman AD, Vist GE, Kunz R, Falck-Ytter Y,Alonso-Coello P, et al. GRADE: An emerging consensus on ratingquality of evidence and strength of recommendations. BMJ 2008;

336(7650):924-6.
Page 11: Methods of Formal Consensus in Classification/Diagnostic Criteria and Guideline Development

6

6

R. Nair, R. Aggarwal, and D. Khanna 105

64. Guyatt GH, Cook DJ, Jaeschke R, Pauker SG, SchunemannHJ. Grades of recommendation for antithrombotic agents:American College of Chest Physicians evidence-based clinicalpractice guidelines (8th edition). Chest 2008;133(Suppl 6):123S-31S.

65. Jaeschke R, Guyatt GH, Dellinger P, Schunemann H, Levy MM,Kunz R, et al. Use of GRADE grid to reach decisions on clinicalpractice guidelines when consensus is elusive. BMJ 2008;337:

a744.

6. Becker RC, Meade TW, Berger PB, Ezekowitz M, O’ConnorCM, Vorchheimer DA, et al. The primary and secondary preven-tion of coronary artery disease: American College of Chest Physi-cians evidence-based clinical practice guidelines (8th edition).Chest 2008;133(Suppl 6):776S-814S.

7. Shekelle PG, Kravitz RL, Beart J, Marger M, Wang M, Lee M. Arenonspecific practice guidelines potentially harmful? A randomized com-parison of the effect of nonspecific versus specific guidelines on physician

decision making. Health Serv Res 2000;34(7):1429-48.