elaboracion de guias

Upload: jumabarrientos

Post on 02-Jun-2018

226 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/10/2019 elaboracion de guias

    1/60

    Prepared by

    Gary S. Gronseth, MD, FAANLaura Moses WoodroffeThomas S. D. Getchius

    Clinical PracticeGuideline Process M

    2011 Edition

    For the American Academy of Neurology (AAN) Guideline Development Subcommittee, AAN membership, and the pub

  • 8/10/2019 elaboracion de guias

    2/60

    Clinical Practice Guideline Process Manual

    For more information contact: American Academy of Neurology 1080 Montreal AvenueSt. Paul, MN 55116(651) 695-1940 [email protected]

    The authors thank the following for their contributions: Julie Cox, MFA, for copyediting of this edition Erin Hagen for her contributions to the formatting of this manual Wendy Edlund; Yuen T. So, MD, PhD, FAAN; and Gary Franklin, MD, MPH,for their work on the 2004 edition

    James C. Stevens, MD, FAAN; Michael Glantz, MD, FAAN; Richard M. Dubinsky, MD, MPH, FAAN;and Robert E. Miller, MD, for their work on the 1999 edition

    Members of the Guideline Development Subcommittee for their efforts in developing high-quality,evidence-based guidelines for the AAN membership

    Guideline Development Subcommittee Members John D. England, MD, FAAN, ChairCynthia L. Harden, MD, Vice ChairMelissa Armstrong, MDEric J. Ashman, MDStephen Ashwal, MD, FAANMisha-Miroslav Backonja, MDRichard L. Barbano, MD, PhD, FAANMichael G. Benatar, MBChB, DPhil, FAAN

    Diane K. Donley, MDTerry D. Fife, MD, FAANDavid Gloss, MD John J. Halperin, MD, FAANDeborah Hirtz, MD, FAANCheryl Jaigobin, MD Andres M. Kanner, MD Jason Lazarou, MDSteven R. Mess, MD, FAANDavid Michelson, MDPushpa Narayanaswami, MBBS, DM, FAAN

    Anne Louise Oaklander, MD, PhD, FAANTamara M. Pringsheim, MD Alexander D. Rae-Grant, MDMichael I. Shevell, MD, FAANTheresa A. Zesiewicz, MD, FAAN

    Suggested citation: AAN (American Academy of Neurology). 2011.Clinical Practice Guideline Process Manual,2011 Ed. St. Paul, MN: The American Academy of Neurology.

    2011 American Academy of Neurology

    mailto:[email protected]:[email protected]
  • 8/10/2019 elaboracion de guias

    3/60 A

    XxxxxxxTable of Contents

    Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

    Introduction to Evidence-based Medicine . . . . . . . . . . . . . . . . . . 2

    EBM Process as Applied by the AAN . . . . . . . . . . . . . . . . . . . . . . . . 3

    A. Developing the Questions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3i. PICO Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3ii. Types of Clinical Questions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4iii. Development of an Analytic Framework. . . . . . . . . . . . . . . . . . 5

    B. Finding and Analyzing Evidence . . . . . . . . . . . . . . . . . . . . . . . 6i. Finding the Relevant Evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . 6ii. Identifying Methodological Characteristics

    of the Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6iii. Rating the Risk of Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8iv. Understanding Measures of Association. . . . . . . . . . . . . . . . . 11 v. Understanding Measures of Statistical Precision. . . . . . . . . 12 vi. Interpreting a Study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

    C. Synthesizing EvidenceFormulatingEvidence-based Conclusions. . . . . . . . . . . . . . . . . . . . . . . . . . 13i. Accounting for Conflicting Evidence . . . . . . . . . . . . . . . . . . . . 14ii. Knowing When to Perform a Meta-analysis . . . . . . . . . . . . . 14iii. Wording Conclusions for Nontherapeutic Questions. . . . . 15iv. Capturing Issues of Generalizability

    in the Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    D. Making Practice Recommendations. . . . . . . . . . . . . . . . . . . 15i. Rating the Overall Confidence in the Evidence

    from the Perspective of Supporting PracticeR e c o m m e n d a t i o n s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 6

    ii. Putting the Evidence into a Clinical Context . . . . . . . . . . . . . 17iii . Crafting the Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . 20iv. Basing Recommendations on Surrogate Outcomes . . . . . . 20 v. Knowing When Not to Make a Recommendation. . . . . . . . 21 vi. Making Suggestions for Future Research . . . . . . . . . . . . . . . . 21

    Logistics of the AAN Guideline Development Process . . . . . . . 22

    A. Distinguishing Types of AANEvidence-based Documents.. . . . . . . . . . . . . . . . . . . . . . . . . . .22i. Identifying the Three Document Types. . . . . . . . . . . . . . . . . . .22ii. Understanding Common Uses of AAN

    Systematic Reviews and Guidelines . . . . . . . . . . . . . . . . . . . . . .22

    B. Nominating the Topic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .22

    C. Collaborating with Other Societies . . . . . . . . . . . . . . . . . . . . .23D. Forming the Author Panel (Bias/Conflict of Interest). . . . . . 23

    E. Revealing Conflicts of Interest . . . . . . . . . . . . . . . . . . . . . . . . .23i. Obtaining Conflict of Interest Disclosures . . . . . . . . . . . . . . . .23ii. Identifying Conflicts That Limit Participation . . . . . . . . . . . .24iii. Disclosing Potential Conflicts of Interest . . . . . . . . . . . . . . . . .24

    F. Undertaking Authorship. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24i. Understanding Roles and Responsibilities . . . . . . . . . . . . . .

    G. Completing the Project Development Plan. . . . . . . . . . . . . 24i. Developing Clinical Questions. . . . . . . . . . . . . . . . . . . . . . . . .ii. Selecting the Search Terms and Databases . . . . . . . . . . . . . . iii. Selecting Inclusion and Exclusion Criteria. . . . . . . . . . . . . . .iv. Setting the Project Timeline. . . . . . . . . . . . . . . . . . . . . . . . . . . .

    H. Performing the Literature Search .. . . . . . . . . . . . . . . . . . . . 26i. Consulting a Research Librarian. . . . . . . . . . . . . . . . . . . . . . . .ii. Documenting the Literature Search .. . . . . . . . . . . . . . . . . . . .iii. Ensuring the Completeness of the Literature

    Search: Identifying Additional Articles.. . . . . . . . . . . . . . . . . .iv. Using Data from Existing Traditional Reviews,

    Systematic Reviews, and Meta-analyses .. . . . . . . . . . . . . . . . v. Minimizing Reporting Bias: Searching for

    Nonpeer-reviewed Literature . . . . . . . . . . . . . . . . . . . . . . . . . .

    I. Selecting Articles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27i. Reviewing Titles and Abstracts. . . . . . . . . . . . . . . . . . . . . . . . .ii. Tracking the Article Selection Process . . . . . . . . . . . . . . . . . . .iii. Obtaining and Reviewing Articles . . . . . . . . . . . . . . . . . . . .

    J. Extracting Study Characteristics . . . . . . . . . . . . . . . . . . . 27i. Developing a Data Extraction Form . . . . . . . . . . . . . . . . . . . . .ii. Constructing the Evidence Tables . . . . . . . . . . . . . . . . . . . . . .

    K. Drafting the Document . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28i. Getting Ready to Write . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .ii . Formatting the Manuscript . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    L. Reviewing and Approving Guidelines.. . . . . . . . . . . . . . . . . 30i. Stages of Review. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    M. Taking Next Steps (Beyond Publication) . . . . . . . . . . . . . . . 31i. Undertaking Dissemination. . . . . . . . . . . . . . . . . . . . . . . . . . . .ii . Responding to Correspondence . . . . . . . . . . . . . . . . . . . . . . . .iii. Updating Systematic Reviews and CPGs. . . . . . . . . . . . . . . .

    Appendices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33i. Evidence-based Medicine Resources.. . . . . . . . . . . . . . . . . . .ii. Formulas for Calculating Measures of Effect . . . . . . . . . . . . iii. Classification of Evidence Matrices . . . . . . . . . . . . . . . . . . . .iv. Narrative Classification of Evidence Schemes . . . . . . . . . . . v. Sample Evidence Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi. Tools for Building Conclusions

    and Recommendat ions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii. Clinical Contextual Profile Tool . . . . . . . . . . . . . . . . . . . . . . . viii. Conflict of Interest Statement . . . . . . . . . . . . . . . . . . . . . . . . .ix. Project Development Plan Worksheet . . . . . . . . . . . . . . . . . . x. Sample Data Extraction Forms . . . . . . . . . . . . . . . . . . . . . . . . xi. Manuscript Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii. Sample Revision Table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  • 8/10/2019 elaboracion de guias

    4/601

    Preface

    This manual provides instructions fordeveloping evidence-based practice guidelines

    and related documents for the American Academy of Neurology (AAN). It is intended formembers of the AANs Guideline DevelopmentSubcommittee (GDS) and facilitators andauthors of AAN guidelines. The manual is alsoavailable to anyone curious about the AANguideline development process, including AANmembers and the public.

    Clinical practice guidelines (CPG) arestatements that include recommendationsintended to optimize patient care that areinformed by a systematic review of evidenceand an assessment of the benefits and harmsof alternative care options.1

    Although the goal of all practice guidelines isthe sameto assist patients and practitionersin making health care decisionsdifferentorganizations use different methodologies todevelop them. The AAN uses a strict evidence-based methodology that follows the Instituteof Medicines (IOM) standards for developingsystematic reviews and CPGs.1,2 All AANguidelines are based upon a comprehensivereview and analysis of the literature pertinentto the specific clinical circumstance. The

    evidence derived from this systematicreview informs a panel of experts whotransparently develop the conclusions andrecommendations of the CPG using a formalconsensus development process.

    This manual is divided into four sections. Thefirst is a brief introduction to evidence-basedmedicine (EBM). This section closes with therationale for the AANs adoption of the EBMmethodology for the development ofits practice recommendations.

    The second section is an in-depth descriptionof the EBM process as applied by the AAN. Itdescribes the technical aspects of each stepof the processfrom developing questionsto formulating recommendations.

    The third section of the manual describesthe logistics of AAN guideline development.It details the intricacies of guideline

    developmentfrom proposing a guidelinetopic to formatting and writing an

    AAN guideline for publication.

    The last section consists of appendices ofsupportive materials, including tools usefulfor the development of an AAN guideline.

    This manual gives an in-depth descriptionof the process that the AAN employsfor developing practice guidelines. Itnecessarily introduces many statistical andmethodological concepts important to theguideline development process. However,this manual does not comprehensivelyreview these topics. The reader is referred toappendix 1 for a list of resources providingfurther information on statistical andmethodological topics.1Institute of Medicine of the National Academies.Clinical Practice Guidelines We Can Trust: Standards forDeveloping Trustworthy Clinical Practice Guidelines (CPGs).http://www.iom.edu/Reports/2011/Clinical-Practice-Guidelines-We-Can-Trust.aspx . Released March 23, 2011. Accessed August 11, 2011.

    2Institute of Medicine of the National Academies.Finding What Works in Health Care: Standards for SystematicReviews. http://www.iom.edu/Reports/2011/Finding-What-Works-in-Health-Care-Standards-for-Systematic-Reviews.aspx . ReleasedMarch 23, 2011. Accessed August 11, 2011.

    EBM concepts are best introduced with acase such as the following example regardingischemic stroke. A 55-year-old banker with ahistory of controlled hypertension is diagnosed with a small, left-hemispheric ischemic stroke.He has minimal post-stroke functional deficits.The usual stroke workup does not identify thespecific cause. An echocardiogram shows noobvious embolic source but does demonstratea patent foramen ovale (PFO). What is the beststrategy to prevent another ischemic strokein this patient?

    Neurologists have varied and often strongopinions on the appropriate management ofcryptogenic stroke patients with PFOs. Some would recommend closure of the PFO, as itis a potential source of paradoxical emboli.Others would consider the PFO incidental andunlikely to be causally related to the stroke.

    DID YOU KNOW?he hree Pillars

    Evidence is only one source of knowledgeclinicians use to make decisions. Othersources include established Principles for example the neuroanatomicprinciples that enable neurologists toknow precisely that a patient has a lesionin the lateral medulla just by examiningthe patientand Judgment theintuitive sense clinicians rely onto help them decide what to do whenthere is uncertainty. One of the goalsof the EBM method of analysis isto distinguish explicitly betweenthese sources of knowledge.

    Recommendation

    Judgment

    Evidence

    Principles

    http://www.iom.edu/Reports/2011/Clinical-Practice-Guidelines-We-Can-Trust.aspxhttp://www.iom.edu/Reports/2011/Clinical-Practice-Guidelines-We-Can-Trust.aspxhttp://www.iom.edu/Reports/2011/Finding-What-Works-in-Health-Care-Standards-for-Systematic-Reviews.aspxhttp://www.iom.edu/Reports/2011/Finding-What-Works-in-Health-Care-Standards-for-Systematic-Reviews.aspxhttp://www.iom.edu/Reports/2011/Finding-What-Works-in-Health-Care-Standards-for-Systematic-Reviews.aspxhttp://www.iom.edu/Reports/2011/Finding-What-Works-in-Health-Care-Standards-for-Systematic-Reviews.aspxhttp://www.iom.edu/Reports/2011/Clinical-Practice-Guidelines-We-Can-Trust.aspxhttp://www.iom.edu/Reports/2011/Clinical-Practice-Guidelines-We-Can-Trust.aspx
  • 8/10/2019 elaboracion de guias

    5/602

    Introduction to Evidence-based Medicine

    Some would choose antiplatelet medicationsfor secondary stroke prevention whereas

    others would choose anticoagulation. Whichtreatment strategy is most likely to preventanother stroke?

    Asking a question is the first step in theEBM process (see f igure 1). To answer thePFO question, the EBM method would nextrequire looking for strong evidence. So, whatis evidence?

    DID YOU KNOW?It is important to remember that relative

    to AAN practice guidelines, the termevidence refers to information fromstudies of clinically important outcomesin patients with specific conditionsundergoing specific interventions. Basicscience studies including animal studies,though providing important informationin other contexts, are not formallyconsidered in the development ofpractice guidelines.

    Evidence in an EBM context is information

    from any study of patients with the condition who are treated with the intervention ofinterest and are followed to determine theiroutcomes. Evidence that would informour question can be gained from studies ofpatients with cryptogenic stroke and PFO whoundergo PFO closure or other therapy andare followed to determine whether they havesubsequent strokes. For finding such studiesthe EBM method requires comprehensivesearches of online databases such asMEDLINE. The systematic literature searchmaximizes the chance that we will find allrelevant studies.

    When a study is found, we need to determinethe strength of the evidence it provides.For this purpose EBM provides validatedrules that determine the likelihood thatan individual study accurately answers aquestion. Studies likely to be accurate providestrong evidence. Rating articles accordingto the strength of the evidence provided is

    especially necessary when different studiesprovide conflicting results. For example,

    some studies of patients with cryptogenicPFO stroke might suggest that closure lowersstroke risk whereas others might suggest thatantiplatelet treatment is as effective as PFOclosure. The study providing the strongestevidence should carry more weight.

    After all the relevant studies have been foundand rated, the next step in the EBM processis to synthesize the evidence to answer thequestion. Relative to PFO, after the literaturehas been comprehensively searched andall the studies have been rated, one woulddiscover that no study provides strongevidence that informs the question as to theoptimal therapy. The evidence is insufficientto support or refute the effectiveness of anyof the proposed treatment strategies.

    When faced with insufficient evidence toanswer a clinical question, clinicians haveno choice but to rely on their individual judgments. The absence of strong evidence islikely one of the reasons there is such practice variation relative to the treatment of PFO.Importantly, relative to our PFO question, theEBM process tells us that these treatment

    decisions are judgmentsthat is, they aremerely informed opinions. No matter howstrong the opinion, no one really knows whichtreatment strategy is more likely to preventanother stroke.

    The all-too-common clinical scenario for which there is insufficient evidence to informour questions highlights the rationale for the AANs decision to rely on strict EBM methodsfor guideline development. In the case ofinsufficient evidence, such as the treatmentof a patient with cryptogenic stroke and PFO,an expert panels opinion on the best course ofaction could be sought. This would enable themaking of practice recommendations on how totreat such patients. However, endorsing expertopinion in this way would result in the AANssubstituting the judgment of its members withthe judgment of the expert panel. When suchopinions are discussed in an AAN guideline theyare clearly labeled as opinions.

    To be sure, the AAN values the opinionof experts and involves them in guideline

    development. However, the AAN alsounderstands that the neurologist caringfor a patient has better knowledge of thatpatients values and individual circumstance When there is uncertainty, the AAN believesdecisions are best left to individual physicianand their patients after both physicians andpatients have been fully informed of thelimitations of the evidence.

    Question

    Evidence

    Conclusion

    Recommendation

    Figure 1. he EBM Process

    DID YOU KNOW?Misconceptions Regarding EBMThere are several pervasivemisconceptions regarding EBM. Acommon one is that EBM is cookbookmedicine that attempts to constrainphysician judgment. In fact, the naturalresult of the application of EBM methodsis to highlight the limitations of theevidence and emphasize the need forindividualized physician judgment inall clinical circumstances.

  • 8/10/2019 elaboracion de guias

    6/603

    EBM Process as Applied by the AAN

    The EBM process used in the cryptogenic strokeand PFO scenario illustrates the flow of the EBM

    process (see figure 1) in the development of AAN practice guidelines. First, guideline authorsidentify one or more clinical question(s) thatneed(s) to be answered. The question(s) shouldaddress an area of quality concern, controversy,confusion, or practice variation.

    Second, guideline authors identify andevaluate all pertinent evidence. Acomprehensive literature search is performed.The evidence uncovered in the search isevaluated and explicitly rated on the basisof content and quality.

    Third, the authors draw conclusions thatsynthesize and summarize the evidenceto answer the clinical question(s).

    Finally, the authors provide guidance toclinicians by systematically translatingthe conclusions of the evidence to actionstatements in the form of practicerecommendations. The recommendations are worded and graded on the basis of the qualityof supporting data and other factors, includingthe overall magnitude of the expected risksand benefits associated with the intervention.

    The subsequent sections expand on eachof these steps.

    PI FALLMany guidelines have been delayed for yearsbecause of poorly formulated questions.

    DID YOU KNOW?The first three steps of the EBM process

    from question to conclusionconstitute thesystematic review. If we stop at conclusions, we have not developed a guideline. Addingthe additional stepfrom conclusionsto recommendationstransforms thesystematic review into a guideline.

    Developing the QuestionsDeveloping a question answerable from theevidence forms the foundation of the AANsEBM process. The literature search strategy,evidence-rating scheme, and format ofthe conclusions and recommendations allflow directly from the question. Getting thequestions right is critical.

    Formulating an answerable clinical questionis not a trivial step. It takes considerablethought and usually requires several iterations.

    PICO FormatClinical questions must have four components:

    1. Population : The type of person(patient) involved

    2. Intervention : The exposure ofinterest that the person experiences(e.g., therapy, positive test result,presence of a risk factor)

    3. Co-intervention : An alternativetype of exposure that the person couldexperience (e.g., no therapy, negativetest result, absence of a risk factorsometimes referred to as the control)

    4. Outcome : The outcome(s)

    to be addressedPopulationThe population usually consists of a groupof people with a disease of interest, such aspatients with Bells palsy or patients withamyotrophic lateral sclerosis (ALS). Thepopulation of interest may also consist ofpatients at risk for a disease, for examplepatients with suspected multiple sclerosis(MS) or those at risk for stroke.

    Often it is important to be very specific indefining the patient population. It may benecessary, for example, to indicate that thepatient population is at a certain stage ofdisease (e.g., patients with new-onset Bellspalsy). Likewise, it may be necessary toindicate explicitly that the population ofinterest includes or excludes children.

    DID YOU KNOW? he PICO FormatIn the EBM world the necessity offormulating well-structured clinicalquestions is so ingrained that there is amnemonic in common use: PICO. Thishelps to remind guideline developersof the need to explicitly define all fourcomponents of a clinical question:Some EBM gurus recommend adding twoadditional items to a clinical question:T for time, to explicitly indicate thetime horizon one is interested in whenobserving the outcomes (e.g., disabilityat 3 months following a stroke); and,S for setting, to identify the particularsetting that is the focus of the question(e.g., community outpatient setting vs.tertiary hospital inpatient setting). PICOis thus sometimes expanded to PICOTS.

    InterventionThe intervention defines the treatment ordiagnostic procedure being considered. Thequestion almost always asks whether thisintervention should be done. An exampleis, should patients with new-onset Bells palsbe treated with steroids?

    An example from the perspective of adiagnostic consideration would be: Shouldpatients with new-onset Bells palsy routinelreceive brain imaging?

    More than one intervention can be explicitly oimplicitly included in the question. An exampis, in patients with ALS which interventionsimprove sialorrhea? This more general questiimplies that authors will look at all potentialinterventions for treating sialorrhea.

    It may be important to be highly specificin defining the intervention. For example,authors might indicate a specific dose ofsteroids for the Bells palsy treatment ofinterest. Likewise, authors might choose tolimit the question to steroids received withinthe first 3 days of palsy onset.

  • 8/10/2019 elaboracion de guias

    7/604

    The way the interventions are specificallydefined in the formulation of the question will determine which articles are relevantto answering the question.

    Co-interventionThe co-intervention is the alternative tothe intervention of interest. For therapeutic

    questions the co-intervention could be notreatment (or placebo) or an alternativetreatment (e.g., L-3,4-dihydroxyphenylalanine[L-DOPA] vs. dopamine agonists for theinitial treatment of Parkinson disease [PD]).For a population screening question, thealternative is not to screen.

    The co-intervention is a bit more difficult toconceptualize for prognostic or diagnosticquestions. Here the intervention is oftensomething that cannot be actively controlledor altered. Rather it is the result of a diagnostictest (e.g., the presence or absence of 14-3-3protein in the spinal fluid of a patient withsuspected prion disease) or the presence orabsence of a risk factor (e.g., the presence orabsence of a pupillary light response at 72 hoursin a patient postcardiac arrest). Relative toa prognostic question the co-interventionis the alternative to the presence of a riskfactorthe absence of a risk factor. Likewise,for a diagnostic test, the alternative to theinterventiona positive test resultis anegative test result.

    Of course, there are circumstances where

    there may be many alternatives. Theinitial treatment of PD, for example, couldcommence with L-DOPA, a dopamine agonistor a monoamine oxidase B (MAO-B) inhibitor.

    Finally, it is important to realize that thereare times when the co-intervention is impliedrather than explicitly stated in the question.The following is an example:

    In patients with Bells palsy doesprednisilone given with the first 3 daysof onset of facial weakness improve thelikelihood of complete facial functional

    recovery at 6 months?Here the co-intervention is not stated butimplied. The alternative to prednisilone inthis question is no prednisilone.

    OutcomesThe outcomes to be assessed should beclinically relevant to the patient. Indirect(or surrogate) outcome measures, such aslaboratory or radiologic results, should beavoided, if doing so is feasible, because they

    often do not predict clinically importantoutcomes. Many treatments reduce the riskfor a surrogate outcome but have no effect,or have harmful effects, on clinically relevantoutcomes; some treatments have no effecton surrogate measures but improve clinicaloutcomes. In unusual circumstanceswhensurrogate outcomes are known to be stongly

    and causally linked to clinical outcomesthey can be used in developing a practicerecommendation. (See the section ondeductive inferences.)

    When specifying outcomes it is importantto specifyall of the outcomes that arerelevant to the patient population andintervention. For example, the question mightdeal with the efficacy of a new antiplateletagent in preventing subsequent ischemicstrokes in patients with noncardioembolicstroke. Important outcomes needingexplicit consideration include the riskof subsequent ischemic strokebothdisabling and nondisablingdeath, bleedingcomplicationsboth major and minorandother potential adverse events. Every clinicallyrelevant outcome should be specified. Whenthere are multiple clinically importantoutcomes it is often helpful at the questiondevelopment stage to rank the outcomes bydegrees of importance. (Specifying the relativeimportance of outcomes will be consideredagain when assessing our confidence in theoverall body of evidence.)

    In addition to defining the outcomes that areto be measured, the clinical question shouldstate when the outcomes should be measured.The interval must be clinically relevant; forchronic diseases, outcomes that are assessedafter a short follow-up period may not reflectlong-term outcome.

    Questions should be formulated so that thefour PICO elements are easily identified. Thefollowing is an example:

    Population : For patients with Bells palsy Intervention : do oral steroids given

    within the first 3 days of onsetCo-intervention : as compared withno steroidsOutcome : improve long-term facialfunctional outcomes?

    Types of Clinical QuestionsThere are several distinct subtypes of clinicalquestions. The differences among questiontypes relate to whether the question is primarilyof a therapeutic, prognostic, or diagnosticnature. Recognizing the different types of

    questions is critical to guiding the process ofidentifying evidence and grading its quality.

    TherapeuticThe easiest type of question to conceptualizeis the therapeutic question. The clinician mudecide whether to use a specific treatment.The relevant outcomes of interest are the

    effectiveness, safety, and tolerability of thetreatment. The strongest study type fordetermining the effectiveness of a therapeutiintervention is the masked, randomized,controlled trial (RCT).

    Diagnostic and Prognostic AccuracyThere are many important questions inmedicine that do not relate directly to theeffectiveness of an intervention in improvingoutcomes. Rather than deciding to performan intervention to treat a disease, the cliniciamay need to decide whether he or she should

    perform an intervention to determine thepresence or prognosis of the disease. Therelevant outcome for these questions isnot the effectiveness of the interventionfor improving patient outcomes. Rather, theoutcome relates to improving the cliniciansability to predict the presence of the diseaseor the disease prognosis. The implication ofthese questions is that improving cliniciansability to diagnose and prognosticate indirectltranslates to improved patient outcomes.

    For example, a question regarding prognosticaccuracy could be worded, for patients withnew-onset Bells palsy, does measuring theamplitude of the facial compound motoraction potential predict long-term facialoutcome? The intervention of interest inthis question is clearly apparent: facial nerveconduction studies. The outcome is alsoapparent: an improved ability to predict thepatients long-term facial functioning. Havinthe answer to this question would go a long way in helping clinicians to decide whetherthey should offer facial nerve conductionstudies to their patients with Bells palsy.

    An RCT would not be the best study typefor measuring the accuracy of facial nerveconduction studies for determining prognosiin Bells palsy. Rather, the best study type would be a prospective, controlled, cohortsurvey of a population of patients with Bellspalsy who undergo facial nerve conductionstudies early in the course of their diseaseand whose facial outcomes are determinedin a masked fashion after a sufficiently longfollow-up period.

  • 8/10/2019 elaboracion de guias

    8/605

    Questions of diagnostic accuracy follow aformat similar to that of prognostic accuracyquestions. For example, for patients with new-onset peripheral facial palsy, does the presenceof decreased taste of the anterior ipsilateraltongue accurately identify those patients withBells palsy? The intervention of interest istesting ipsilateral taste sensation. The outcome

    of interest is the presence of Bells palsy asdetermined by some independent reference.(In this instance the reference standard wouldmost likely consist of a case definition thatincluded imaging to rule out other causesof peripheral facial palsy.)

    As with questions of prognostic accuracy, thebest study type to determine the accuracy ofdecreased taste sensation for identifying Bellspalsy would be a prospective, controlled, cohortsurvey of a population of patients presenting with peripheral facial weakness who all hadtaste sensation tested and who all were furtherstudied to determine whether they in fact hadBells palsy, using the independent referencestandard. If such a study demonstrated thattesting taste sensation was highly accurate indistinguishing patients with Bells palsy frompatients with other causes of peripheral facial weakness, we would recommend that cliniciansroutinely test taste in this clinical setting.

    Population ScreeningThere is another common type of clinicalquestion worth considering. These questionshave a diagnostic flavor but are moreconcerned with diagnostic yield than withdiagnostic accuracy. This type of question isapplicable to the situation where a diagnosticintervention of established accuracy isemployed. An example is, in patients withnew-onset peripheral facial palsy should aphysician routinely obtain a head MRI toidentify sinister pathology within the temporalbone causing the facial palsy? There is noconcern with regard to the diagnostic accuracyof head MRI in this situation. The diagnosticaccuracy of MRI in revealing temporal bonepathology is established. The clinical question

    here is whether it is useful to routinelyscreen patients with facial palsy with a head MRI.The outcome of interest is the yield of theprocedure: the frequency with which the MRIreveals clinically relevant abnormalities in thispatient population. The implication is that ifthe yield were high enough, clinicians wouldroutinely order the test.

    The best evidence source to answer thisquestion would consist of a prospective studyof a population-based cohort of patients with

    Bells palsy who all undergo head MRI earlyin the course of their disease.

    CausationOccasionally, a guideline asks a questionregarding the cause-and-effect relationship of anexposure and a condition. Unlike diagnostic andprognostic accuracy questions that look merely

    for an association between a risk factor and anoutcome, causation questions seek to determine whether an exposure causes a condition. Anexample is, does chronic repetitive motion causecarpal tunnel syndrome? Another example is,does natalizumab cause progressive multifocalleukoencephalopathy? The implication is thatavoidance of the exposure would reduce the riskof the condition. As in these examples, causationmost often relates to questions of safety.

    Theoretically, as with therapeutic questions,the best evidence source for answeringcausation questions is the RCT. However,in many circumstances, for practical andethical reasons an RCT cannot be done todetermine causation. The outcome may be toouncommon for an RCT to be feasible. Theremay be no way to randomly assign patients to varying exposures. In these circumstances, thebest evidence source for causation becomes acohort survey where patients with and patients without the exposure are followed to determine whether they develop the condition. Critical toanswering the question of causation in this typeof study is strictly controlling for confoundingdifferences between those exposed and thosenot exposed.

    Determining the type of question early inguideline development is critical for directingthe process. The kind of evidence neededto answer the question and the method for judging a studys risk of bias follow directlyfrom the question type.

    Development of anAnalytic FrameworkFundamentally all CPGs attempt to answerthe question, for this patient population doesa specific intervention improve outcomes?The goal is to find evidence that directly linksthe intervention with a change in outcomes. When such direct evidence is found, it is often astraightforward exercise to develop conclusionsand recommendations. When direct evidencelinking the intervention to the outcome is notfound, it may be necessary to explicitly developan analytic framework to help define the typesof evidence needed to link the intervention topatient relevant outcomes.

    As a case in point, consider myotonicdystrophy (MD). Patients with MD are knownto be at increased risk for cardiac conductionabnormalities. The question posed is, doesroutinely looking for cardiac problems in patien with MD decrease the risk that those patients will have heart-related complications such assudden death? One type of analytic framewor

    that can be constructed is a decision tree.Figure 2 graphically depicts the factors thatcontribute to a decision that must be made(indicated by the black squarea decisionnodeat the base of the sideways tree). If we do not screen, the patient might or mightnot develop a cardiac conduction problemthat leads to cardiac death (this probabilityis depicted by black circleschance nodes).If we screen, the patient also has a chanceof cardiac death (another chance nodein figure 2), but presumably, this chance would be decreased by some degree because we have identified patients at increasedrisk for cardiac death and treated themappropriately (perhaps placing a pacemakerafter identifying heart block on a screeningEKG). The probability that screening willidentify an abnormality (Pi)conductionblock on an EKGmultiplied by a measureof the effectiveness of placing a pacemaker ireducing the risk of cardiac death in patients with conduction block (RRrx) should tell ushow much the risk of cardiac death is reduce with screening in patients with MD.

    No cardiac death

    Cardiac death

    Cardiac death

    No cardiac death

    Ps/Pn

    Screen

    No screen

    Ps=(pn+RRrx *Pi)

    1-Ps

    Pn

    1-Pn

    Figure 2. A Decision ree

    Direct evidence for a link between screeningand reduced cardiac death would be providedby a studyideally an RCTthat comparescardiac outcome in patients with MD whoare screened with patients with MD whoare not screened. If such evidence does notexist (which is probably the case) the analytiframework of the decision tree helps CPGproducers identify alternative questions (anddifferent evidence types) that might informthe decision. For example, one could find astudy in which all patients with MD were

  • 8/10/2019 elaboracion de guias

    9/606

    routinely screened with EKG and in whichthe percentage of patients with conductionblock was reported. One might also find aseparate study that reports the effectivenessof pacemaker placement in reducing therisk of cardiac death in patients with MD with conduction block. Using these evidencesources and the analytic framework enables

    a linking of the intervention and outcome.Such analyses often suggest to guidelinedevelopers other helpful clinical questionsto be asked. Rather than simply askingthe therapeutic question directly linkingintervention to outcome:

    For patients with MD, does routinescreening with EKG (as compared withnot routinely screening) reduce the riskof sudden cardiac death?

    Guideline developers will also askthese questions:

    For patients with MD, how often doesroutine EKG screening (vs. no screening)identify patients with conduction block?For patients with MD and conductionblock, does pacemaker placement(vs. no placement) reduce the riskof cardiac death?

    Of course, in this example there are otherpotentially important outcomes to beconsidered, such as complications relatedto pacemaker screening. All importantoutcomes should be considered.

    An analytic framework increases thelikelihood that the systematic review willidentify studies whose evidence, when

    analyzed, will answer the underlying clinicalquestion by suggesting related questions. Additionally, the framework aids in theidentification of all important outcomes.

    A decision tree is one tool that is commonlyused to develop an analytic frameworkacausal pathway is another. Figure 3 illustrates

    a causal pathway used to assist in developingquestions for a guideline regarding thediagnostic accuracy of tests for carpal tunnelsyndrome. Regardless of the tool chosen, itis worth taking the time to use an analyticframework to help define and refine theclinical questions.

    Finding andAnalyzing EvidenceFinding the Relevant Evidence

    A comprehensive literature search distinguishesthe systematic review that forms the basis of an AAN guideline from standard review articles. Thecomprehensive search is performed to ensure,as much as possible, that all relevant evidenceis considered. This helps to reduce the risk ofbias being introduced into the process. Authorsare not allowed to choose which articles they want to include (as they may select those articlesthat support their preconceptions). Rather, allrelevant evidence is considered.

    The most commonly searched databaseis MEDLINE. Other medical databases arealso used (this is discussed further in thelogistics section).

    The initial literature search is crafted (usually with the help of a research librarian) so asto cast a wide net to ensure that relevantarticles are not missed. Content experts playan important role in this step: on the basis oftheir knowledge of the literature they identifa few key articles they know are relevantto each of the clinical questions. These key

    articles are used to validate the search. Ifthe key articles are missed in the search, thesearch strategy must be revised.

    After completing a comprehensive search,authors use a two-step process (see figure 4)to identify relevant studies. First, authorsreview the titles and abstracts from thecomprehensive search, to exclude citationsthat are obviously irrelevant to the question.Second, authors review the full text ofthe included titles and abstracts againstprespecified inclusion and exclusion criteriaThe studies meeting the inclusion andexclusion criteria form the evidence sourceof the guideline.

    DID YOU KNOW?Studies are included even when theguideline panel members doubtthe veracity of the results. A criticalassumption built into the EBM processis that investigators do not lie aboutor fabricate data. Unless there is directevidence of scientific misconduct (in whichcase the study would likely be retracted),

    every study is included and analyzed usingthe same rules.

    A secondary search of the references fromreview articles identified in the initial searchshould be made to identify any relevant studiethat may have been missed.

    For transparency, it is important to keep trackof excluded articles and the reasons for theirexclusion. After completing article selection, thauthors construct a diagram depicting the flowof articles through the process, including thenumber excluded (see figure 5). This diagramis included in the final (published) guideline.

    The identified studies meeting inclusioncriteria form the evidence base that informsthe review.

    Identifying MethodologicalCharacteristics of the Studies After the studies are identified, it is necessarto extract essential characteristics of eachof the studies selected for inclusion. These

    Figure 3. A Causal Pathway

    Osteoarthritis

    Acromegaly

    Amyloidosis

    Inflammation

    Enlarged mediannerve

    Narrowed carpaltunnel

    Median nervecompression

    Median nervedysfunction

    Symptoms andsigns

    Non-neurologiccauses of

    symptoms

    Otherneurologiccauses of

    symptoms

    Non-CTScauses of

    median nervedysfinction

    Wrist x-ray

    Wristultrasound

    Electro-diagnostics

    Clinicalcriteria

  • 8/10/2019 elaboracion de guias

    10/607

    extracted characteristics will be used to assesseach studys strength.

    The characteristics of each study will beincluded in a master (evidence) table. This tablesuccinctly summarizes each study, includingcharacteristics relevant to generalizability,risk of bias, and patient outcomes.

    Elements Relevant to Generalizability Authors should extract from the studies thoseelements that inform the judgment of eachstudys relevance to the clinical questionand the generalizability of the results. Theseelements can be directly related to aspectsof the clinical question.

    Elements relating to the patient populationshould include the following: Source of patients (e.g., neuromuscular

    referral center) Inclusion criterion used in the studyto determine the presence of thecondition of interest

    Age of the patients (e.g., mean andstandard deviation)

    Gender of the included population(e.g., proportion female)

    Elements relevant to the intervention and co-intervention should also be routinely extracted.These will be highly dependent on the clinicalquestion but could include the following:

    Dose of medication used Timing of the intervention Nature of the diagnostic test (e.g., CT vs. MRI)

    Figure 4. Two-step LiteratureReview Process

    100 articlesidentified frominitial search

    8 relevantarticles

    identified

    Step 1.Review abstracts

    Step 2.Review full text

    Elements relevant to the way the studymeasured outcomes should also be included.These will also vary from question to questionbut could include the following: Scale used to determine the outcome (e.g.,

    global impression of change, House-Brackman vs. Adour-Swanson scale of facial function)

    Duration of follow-up

    Quality-of-Evidence IndicatorsBeyond the elements pertaining togeneralizability, quality-of-evidence indicators

    should also be extracted. The items extracted will vary according to the question type.

    For therapeutic questions, critical elementsinclude the following: Use of a comparison (control) group Method of treatment allocation

    (randomized versus other) Method of allocation concealment Proportion of patients with complete

    follow-up Use of intent-to-treat methodologies Use of masking throughout the study (single

    blind, double-blind, independent assessmen

    For diagnostic or prognostic accuracyquestions, important elements to be includedare the following: Study design (case control versus

    cohort survey)

    Spectrum of patients included(narrow spectrum versus wide spectrum) Proportion of patients for whom both

    the predictor and the outcome variableare measured

    Objectiveness of the outcome variable, an whether the outcome variable is measured without knowledge of the predictor variab

    For screening questions, critical elementsinclude the following: Study design (prospective vs. retrospective Setting (population based, clinic based, or

    referral center based) Sampling method (selected or statistical) Completeness (all patients in the cohort

    underwent the intervention of interest) Masking (interpretation of the diagnostic te

    of interest was performed without knowledof the patients clinical presentation)

    For causation questions, critical elementsinclude the following: Study design (prospective vs. retrospective Setting (population based, clinic based,

    or referral center based) Sampling method (selected or statistical) Completeness (all patients in the cohort

    underwent the intervention of interest) Masking (interpretation of the diagnostic te

    of interest was performed without knowledof the patients clinical presentation)

    The presence of confounding differencesbetween those with and those without theputative causative factor

    Articles identified by theliterature search: 769

    Articles meeting inclusioncriteria after full-text review: 24

    Final number of articlesincluded in the analysis: 32

    Articles identifiedfrom references: 8

    Review articles without original

    date: 21

    Articles notmeeting inclusion

    criteria: 276

    Articles deemedirrelevant: 448

    Articles deemed potentially relevant after reviewing titles

    and abstracts: 321

    Figure 5. Flow Diagram Documenting Disposition of Articles Duringthe Systematic Review

  • 8/10/2019 elaboracion de guias

    11/608

    Patient RelevantOutcome MeasuresFinally, patient relevant outcomes need tobe extracted. These consist of a quantitativemeasure of what happened to patients withinthe study. For example, for a therapeuticquestion, how many patients improved?For a diagnostic question, how many patientshad the disease?

    Regardless of the question type, clinicallyrelevant outcomes are usually best measuredby using discrete, categorical variables ratherthan continuous variables. For example, theproportion of patients with Bells palsy whohave complete facial functional recovery isa more easily interpreted measure of patientoutcome than the overall change in themedian values of the House-Brackmanfacial function score.

    Measuring patient outcomes using categorical variables involves counting patients. Anexample is, how many patients on drug Ximproved, and how many did not improve?Counting patients in this manner oftenenables construction of a contingency table.Table 1 is a simple two-by-two contingencytable showing the numbers of patientsimproving on drug X versus placebo.

    able 1. Contingency ablereatment Improved Not Improved

    Drug X 13 32

    Placebo 6 78

    From this it is a relatively straightforwardprocess to calculate numeric values thatexpress the strength of association betweenthe intervention and the outcome. Examplesare the relative risk of a poor outcome intreated patients versus untreated patients(the proportion of treated patients with apoor outcome divided by the proportion ofuntreated patients with a poor outcome)or the poor-outcome risk difference (theproportion of treated patients with a poor

    outcome minus the proportion of untreatedpatients with a poor outcome).

    Two-by-two contingency tables can also beconstructed for nontherapeutic studies. Forstudies regarding prognosis and causationrelative risks and risk differences can also becalculated. Rather than grouping patientsaccording to whether they received treatment,patients are grouped according to whetherthey had the risk factor of interest.

    Quantitative measures of diagnostic accuracycan also be derived from a contingency table.These include sensitivities and specificities as well as positive and negative predictive valueand likelihood ratios.

    Finally, the quantitative measure used todescribe the population screening studies

    is simply the yield, that is, the proportionof patients with the condition who areundergoing the test of interest.

    Sometimes authors of the studies beingconsidered might not report patient outcomesusing categorical outcome variables. In suchcircumstances, if sufficient informationis provided, panel members themselvesshould attempt to construct contingencytables. If contingency tables cannot beconstructed, panel members should report thequantitative outcome measure(s) as reportedin the original studies. Guideline authors areencouraged to make these determinations with the help of the facilitator or themethodological experts on the GDS.

    Rating the Risk of Bias An important step in guideline developmentis to measure the risk of bias in each includedstudy. Bias, or systematic error, is the studystendency to measure the interventions effecton the outcome inaccurately. It is not possibleto measure the bias of a study directly. (If it were, it would imply we already knew theanswer to the clinical question.) However,

    using well-established principles of good studydesign, we can estimate a studysrisk of bias.

    For AAN guidelines, the risk of bias in studiesis measured using a four-tiered classificationscheme (see appendices 3 and 4). In thisscheme, studies graded Class I are judged tohave a low risk of bias, studies graded ClassII are judged to have a moderate risk of bias,studies graded Class III are judged to havea moderately high risk of bias, and studiesgraded Class IV are judged to have a very highrisk of bias. The classification rating is alsoknown as the level of evidence.

    IP Appendix 2 provides fomulas forcalculating commonly used measuresof association such as the relative risk. Additionaly, the companion spreadsheet will calculate this for you and is availableat www.aan.com/guidelines.

    Panel members assign each study aclassification on the basis of that studysextracted quality-of-evidence characteristics.

    The classification scheme the AAN employsaccounts only for systematic error. Randomerror (low study power) is dealt with separatel

    A studys risk of bias can be judged onlyrelative to a specific clinical question. Thestandards that are applied vary amongthe different question types: therapeutic,

    diagnostic or prognostic accuracy, screeningand causation.

    Appendix 4 describes in paragraph form thestudy characteristics needed to attain the various risk-of-bias grades. The next fivesections explain in more detail each studycharacteristic (or element) that contributesto a studys final classification for each ofthe five study types (therapeutic, diagnostic,prognostic, screening, and causation).

    Classifying Evidence forTherapeutic QuestionsImportant elements for classifying the risk ofbias in therapeutic articles are described below

    Comparison (Control) Group A comparisonor controlgroup in atherapeutic study consists of a group ofpatients who did not receive the treatmentof interest. Studies without a comparisongroup are judged to have a high risk of biasand are graded Class IV.

    To be graded Class I or Class II, studiesshould use concurrent controls. Studies using

    nonconcurrent controls, such as those usingpatients as their own controls (e.g., a before-after design) or those using external controlsare graded Class III.

    DID YOU KNOW?Sometimes a study provides evidencerelevant to more than one question. Oftenin these circumstances the study will havedifferent ratings. For example, a studycould be rated Class I for a therapeuticquestion and Class III for a separate,prognostic question.

    reatment AllocationTo reduce the risk of bias, authors of atherapeutic article must ensure that treatedand untreated patient groups are similarin every way except for the intervention ofinterest. In other words, known and unknownconfounding differences between the treatedand untreated groups must be minimized.

    Randomized allocation to treatment andcomparison groups is the best way to

  • 8/10/2019 elaboracion de guias

    12/609

    minimize these confounding differences.Thus, to be graded Class I, a therapeutic studyshould have randomly allocated patients.

    DID YOU KNOW?The effect of allocation concealmenton a studys accuracy has been well

    established. As it happens, poor allocationconcealment introduces more bias intoa study than failure to mask outcomeassessment.

    An important study characteristic thatensures patients are truly randomly allocatedto different strategies is concealed allocation.Concealed allocation prevents investigatorsfrom manipulating treatment assignment.Examples of concealed allocation include useof consecutively numbered, sealed, opaqueenvelopes containing a predetermined,random sequence for treatment assignmentand use of an independent center that aninvestigator contacts to obtain the treatmentassignment. By comparison, examples ofunconcealed allocation include flipping a coin(e.g., heads = treatment A, tails = treatment B)and assigning patients to treatment categorieson the basis of the date (e.g., treatment Aon odd-numbered days, treatment B oneven-numbered days). These unconcealedallocation methods can be easily manipulatedto control treatment allocation. For examplethe coin can be flipped again, or the patient

    can be told to come back the next day.In addition to description of concealedallocation, Class I rating requires that panelmembers ensure that the randomizationscheme effectively balanced the treatmentand comparison groups for importantconfounding baseline differences. In moststudies the important characteristics of eachtreatment group are summarized in a table(usually the first table in an article describingan RCT). If important baseline differencesexist, any differences in outcomes between thedifferent treatment groups might be explainedby these baseline differences rather than byany treatment effect

    Occasionally, panel members will encounteran article in which investigators attempt tomatch each treated patient with an untreated,comparison patient with similar baselinecharacteristics rather than randomly assignpatients to treatment or comparison groups.Such matched studies are graded Class II.

    Completeness of Follow-upPatients enrolled in studies are sometimes lost

    to follow-up. Such losses occur for nonrandomreasons and may introduce confoundingdifferences between the treated and untreatedgroups. Thus, Class I rating requires that morethan 80% of patients within the study havecompleted follow-up.

    For various reasons, sometimes patients

    initially assigned to the treatment group donot receive treatment, and patients assignedto the comparison group receive treatment.If patients cross over from the treatedgroup to the comparison group or from thecomparison group to the treated group,confounding differences can be introduced. When this happens, it is important that theinvestigators analyze the results using intent-to-treat principles. Put simply, such principlesentail analysis of the results on the basis of whichever group (treatment or comparison)to which each patient was originally assigned.

    DID YOU KNOW?The selection of an 80% completion rate isan arbitrary one. This measure of a studysquality is best understood when positionedon a continuumthe fewer patients lostto follow-up, the better. However, to fit astudy into the ordinal Class I through IVsystem, a cutoff had to be selected. The80% cutoff was suggested by David Sackett,OC, FRSCa pioneer of EBM.1

    Masking

    For a study to be graded Class I or II, aninvestigator who is unaware of the patientsoriginal treatment assignment mustdetermine the outcome. This is termedmasked or blinded outcome assessment.1Sackett, DL, Rosenberg WMC, Muir Gray JA, Haynes RB,Richardson WS. Evidence-based medicine. BMJ 1996;312:71.

    PI FALLIt is important not to confuse allocationconcealment with study masking(or blinding). Allocation concealmentrefers only to how investigators

    randomize patients to differenttreatments. Afterpatients have beenrandomized, masking ensures that theinvestigators are not aware of whichtreatment a patient is receiving.

    For a study to be graded Class III, a studyinvestigator who is not one of the treatingproviders must determine the outcome. Suchindependent outcome assessment, althoughnot as effective in reducing bias as masking,nonetheless has been shown to be less biasprone than having the unmasked treating

    physician determine the outcome. A patientown assessment of his or her outcome(e.g., a seizure diary or completion of aquality-of-life questionnaire) fulfills thecriteria for independent assessment.

    The requirement for masked or independentassessment can be waived if the outcome

    measure is objective. An objective outcome one that is unlikely to be affected by observeexpectation bias (e.g., patient survival or alaboratory assay). Oftentimes determining whether an outcome is objective requiressome judgment by the panel members. Thefinal determination of objectiveness of anyoutcome is made by the AAN GDS.

    Active Control Equivalence andNoninferiority rialsSome therapeutic studies compare theefficacy of a new treatment with that ofanother standard treatment rather thanplacebo. Additional requirements areimposed on these trials.

    To ensure that the new drug is being compared with an efficacious drug, there must be aprevious Class I placebo-controlled trialestablishing efficacy of the standard treatment

    Additionally, the standard treatment mustbe used in a manner that is substantiallysimilar to that used in previous studies(Class I placebo-controlled trial) establishingefficacy of the standard treatment (e.g., fora drug, the mode of administration, dose,and dosage adjustments are similar to thosepreviously shown to be effective).

    Furthermore, the inclusion and exclusioncriteria for patient selection and the outcomesof patients receiving the standard treatment arsubstantially equivalent to those of a previousClass I placebo-controlled study establishingefficacy of the standard treatment.

    Finally, the interpretation of the study resultsis based on an observed-cases analysis.

    Classifying Evidence for Diagnosticor Prognostic Accuracy QuestionsThe following paragraphs present importantelements to be considered when classifyingevidence for a diagnostic or prognosticaccuracy question.

    Comparison (Control) GroupTo be useful, a study of prognostic ordiagnostic accuracy should include patients with and patients without the disease oroutcome of interest. Quantitative measuresof accuracy cannot be calculated from studie

  • 8/10/2019 elaboracion de guias

    13/6010

    without a comparison group. Studies lackinga comparison group are judged to have a highrisk of bias and are graded Class IV.

    Study Design A Class I study of diagnostic or prognosticaccuracy would be a prospective cohortsurvey. Investigators would start with a group

    of patients suspected of having a disease(the cohort). The diagnostic test would beperformed on this cohort. Some patients in thecohort would have positive test results, othersnegative test results. The actual presence orabsence of the disease in the cohort wouldbe determined by an independent referencestandard (the gold standard). Quantitativemeasures of the diagnostic accuracy of thetest (or predictor), such as the sensitivity orspecificity, could then be calculated.

    In studies of diagnostic accuracy, the steps thatare followed in prognostic accuracy studies areoften performed in reverse. Investigators donot start with a group of patients suspected ofhaving the disease; rather, they select a groupof patients who clearly have the disease (cases)and a group of patients who clearly do not(control). The test is then performed on bothcases and controls, and measures of diagnosticaccuracy are calculated. Although such casecontrol studies are often easier to execute thancohort studies, this design introduces severalpotential biases. Thus, at best, such studies canbe graded only Class II.

    DID YOU KNOW?Outcome objectiveness can be rankedinto three tiers:Level One: The unmasked investigator andunmasked patient cannot influence themeasurement of the outcome (e.g., death,missing body part, serum glucose level).Level Two: Either the unmaskedinvestigator or the unmasked patient (butnot both) can influence the measurementof the outcome (e.g., unmasked investigator:blood pressure measurement, MRI lesion

    burden; unmasked patient: seizure diary,caretaker assessment).Level Three: Both the unmasked patientand the unmasked investigator couldinfluence the measurement of theoutcome (e.g., Unified Parkinsons DiseaseRating Scale [UPDRS] score, visual analogscale score, scoring seizure scale score).For AAN guidelines, usually only thosemeasures meeting Level One criteria areconsidered objective.

    PI FALLThe term case control is commonlymisinterpreted. Many studies havecontrols. The termcase control study,however, is reserved specifically forstudies wherein investigators selectpatients because they have the outcomeof interest (e.g., the disease) or becausethey do not have the outcome of interest.The former are the cases; the latter arethe controls.

    Data CollectionFor a cohort study data collection can beprospective or retrospective. In a prospectivecohort study both data collection and thestudy itself begin before any of the patients hasexperienced the outcome. In a retrospective

    cohort study, both data collection and thestudy itself start after some or all of thepatients have attained the outcome of interest.Retrospective data collection introducespotential bias because the investigators usuallyhave to rely on data sources (e.g., medicalrecords) that were not designed for the studysspecific purpose. Studies with prospectivedata collection are eligible for a Class I rating whereas those using retrospective datacollection are at best Class II.

    Patient SpectrumOne of the dangers of the case control designis that such studies sometimes include eitheronly patients who clearly have the diseaseor only those who clearly do not. Includingsuch unambiguous cases can exaggerate thediagnostic accuracy of the test. To avoid this,it is important for a study employing a casecontrol design to include a wide spectrumof patients. A wide-spectrum study wouldinclude patients with mild forms of thedisease and patients with clinical conditionsthat could be easily confused with the disease. A narrow-spectrum study would includeonly patients who clearly had the diseaseand the control groups. Studies employinga case control design with a wide spectrumof patients can be graded Class II, and those with a narrow spectrum, Class III.

    Cohort studies have a lower risk of spectrumbias than case control studies. Occasionally,spectrum bias can be introduced into a cohortstudy if only patients with extreme results ofthe diagnostic test (or risk factor) are included.For example, a study of the diagnosticaccuracy of CSF 14-3-3 for prion disease would

    introduce spectrum bias if it included onlypatients with high 14-3-3 levels and patients with low 14-3-3 levels, thus excluding those with intermediate levels. The exclusion of thpatients with borderline levels would tend toexaggerate the usefulness of the test.

    Reference Standard

    It is essential for the usability of any studyof diagnostic or prognostic accuracy that a valid reference standard be used to confirmor refute the presence of the disease oroutcome. This reference standard shouldbe independent of the diagnostic test orprognostic predictor of interest. To beconsidered independent, the results of thediagnostic test being studied cannot beused in any way by the reference standard.The reference standard could consist ofpathological, laboratory, or radiologicalconfirmation of the presence or absence ofthe disease. At times, the reference standardmight even consist of a consensus-basedcase definition. Panel members shouldgrade as Class IV those studies that lack anindependent reference standard.

    CompletenessIdeally, all patients enrolled in the studyshould have the diagnostic test result(presence of the prognostic variable) andthe true presence or absence of the disease(outcome) measured. A study is downgradedto Class II if these variables are measured foless than 80% of subjects.

    Masking For a study to be graded Class I or II, aninvestigator who is unaware of the resultsof the diagnostic test (presence or absenceof the prognostic predictor) should apply thereference standard to determine the truepresence of the disease (or determine thetrue outcome). In the instance of the casecontrol design, for the study to obtain aClass II grade, an investigator who is unawaof the presence or absence of the disease(or unaware of the outcome) should perform

    the diagnostic test (measure the prognosticpredictor) of interest.

    For a study to be graded Class III, thediagnostic test should be performed(or prognostic predictor measured) byinvestigators other than the investigator whodetermines the true presence or absence ofdisease (or determines the outcome).

    As with the therapeutic classification, therequirement for masked or independentassessment can be waived if the reference

  • 8/10/2019 elaboracion de guias

    14/601

    standard for determining the presence of thedisease (outcome) and the diagnostic test(prognostic predictor) of interest are objective. An objective measure is one that is unlikelyto be affected by expectation bias.

    Classifying Evidence forPopulation Screening QuestionsFor screening questions, panel membersshould use the study elements listed belowto classify the evidence.

    Data CollectionRetrospective collection of data, such as chartreviews, commonly introduces errors relatedto suboptimal, incomplete measurement.Thus, data collection should be prospectivefor a study to be classified Class I.

    Setting Studies are often performed by highly

    specialized centers. Because such centerstend to see more difficult and unusualcases, the patients they treat tend tobe nonrepresentative of the patientpopulation considered in the clinicalquestion. In general, because of the potentialnonrepresentativeness of patients, thesestudies from referral centers are graded asClass III. Occasionally, the population ofinterest targeted in the screening question isprimarily patients referred to specialty centers.For example, some conditions that are rareor difficult to treat may be managed only atreferral centers. In these circumstances, suchstudies can be graded Class II.

    Studies of patients recruited from nonreferralcenters such as primary care clinics or generalneurology clinics are more representative.These studies can be graded Class II.Population-based studies tend to be the mostrepresentative and can be graded Class I.

    SamplingThe ideal methods of selecting patientsfor a study designed to answer a screeningquestion are selecting all patients or selecting

    a statistical sample of patients. Eachmethod ensures that the patient sample isrepresentative. Thus, a patient sample thatis consecutive, random, or systematic (e.g.,every other patient is included) warrantsa Class I or II grade. Because patients maypotentially be nonrepresentative, a studyusing a selective sample of patients can begraded only Class III. For example, a study ofthe yield of MRI in patients with headachethat included patients who happened to havehead MRIs ordered would be Class III becausethe sample is selective. A study in which MRIs

    are performed on all consecutive patientspresenting with headache is not selectiveand would earn a Class I or II grade.

    CompletenessFor reasons similar to those given in thesampling discussion, it is important that allpatients included in the cohort undergo the

    test of interest. If less than 80% of subjectsreceive the intervention of interest, the studycannot be graded higher than Class II.

    MaskingFor a study to be graded Class I or II fora screening question, the intervention ofinterest (usually a diagnostic test) should beinterpreted without knowledge of the patientsclinical presentation.

    Again, the requirement for independent ormasked assessment can be waived if theinterpretation of the diagnostic test is unlikely tobe changed by expectation bias (i.e., is objective).

    Classifying Evidence forCausation QuestionsParticularly relative to patient safety, it maybe impractical or unethical to perform RCTsto determine whether a causal relationshipexists between an exposure and a disease. Aclassic example of this is tobacco smoking.Because of known health risks of tobacco use,no one would advocate an RCT to determine whether smoking causes lung cancer. Yet, theepidemiologic evidence for a causal relationshipbetween smoking and cancer is overwhelming.

    For such circumstances, the AAN hasdeveloped a causation evidence classificationscheme. This enables investigators toassess the risk of bias of studies when theprimary question is one of causation and theconduction of RCTs is not feasible.

    The causation classification of evidence schemeis quite similar to the prognostic classificationscheme. The former places additional emphasison controlling for confounding differencesbetween exposed and unexposed people. Additionally, minimal thresholds for effectsize are prespecified in order for studies toqualify for Class I or II designation. Finally,nonmethodological criteria centering onbiologic plausibility are included.

    Making Modications to theClassication of Evidence SchemesThe classification of evidence schemesdescribed above provide general guidance forrating a studys risk of bias relative to a specificclinical question. These general schemes

    cannot identify all of the potential elementsthat contribute to bias in all situations. Inspecific circumstances, there can be nuancesthat require slight modifications to theschemes. For example, the outcome measuresthat are judged to be objective (i.e., unlikelybe affected by observer expectation bias) can vary on the basis of the exact clinical question

    Those outcomes that will be consideredobjective, or any other modification to theclassification of evidence schemes, need tobe enumerated before study selection anddata abstraction commence. Thisa priori designation of modifications is necessaryto reduce the risk of bias being introducedinto the review. It is acceptable to modify theclassification schemes slightly to fit the specifclinical questions. However, the schemesshould not be modified to fit the evidence.

    Understanding Measures

    of AssociationInterpreting the importance of the resultsof a study requires a quantitative measureof the strength of the association betweenthe intervention and the outcome.

    For a therapeutic question, quantitativeoutcomes in the treated population are usuallmeasured relative to an untreated populationThe variables used to quantitatively representhe effectiveness of an intervention are termemeasures of effectivenessor measuresof association.Common measures ofeffectiveness were introduced in the sectiondescribing study extraction and include therelative risk of an outcome (e.g., the proportiof patients with good facial outcomes inpatients with Bells palsy receiving steroidsdivided by the proportion of good outcomesin those not receiving steroids), or the riskdifference (e.g., the proportion of patients wigood facial outcomes in patients with Bellspalsy receiving steroids minus the proportionof good outcomes in those not receivingsteroids.) See appendix 2 for examples ofhow to calculate these effect measures fromcontingency tables.

    For articles of diagnostic or predictiveaccuracy, relative risks, positive and negativepredictive values, likelihood ratios, andsensitivity and specificity values are theoutcome variables of interest. See appendix2 for examples of how to calculate theseaccuracy measures.

    For screening procedures, the quantitativemeasure of effect will be the proportionof patients with a clinically significantabnormality identified. (See appendix 2.)

  • 8/10/2019 elaboracion de guias

    15/6012

    For reporting a measure of association, absolutemeasures are preferred (e.g., risk difference)to relative measures (e.g., relative risk). Bothrelative risk and risk difference are calculatedfrom contingency tables and rely on categoricaloutcome measures, which are, in turn, preferredto continuous outcome measures. If theauthors of the article being analyzed provide

    only continuous outcome measures, includethese data in the evidence table.

    DID YOU KNOW? As previously mentioned, the AANsclassification of evidence schemeaccounts only for the risk of bias in astudy, not for the contribution of chance.Conversely, confidence intervals and p-values do not measure a studys risk ofbias. The highest-quality study has botha low risk of bias (Class I) and sufficientprecision or power to measure a clinicallymeaningful difference.The mathematical tools available formeasuring the contribution of chanceto a studys results are much moresophisticated than our ability to measurethe risk of bias.

    Understanding Measuresof Statistical PrecisionRegardless of the clinical question type orthe outcome variable chosen, it is critical

    that some measure of random error (i.e., thestatistical power of each study) be included inthe estimate of the outcome. Random errorresults from chance. Some patients improveand some do not regardless of the interventionused. In any given study, more patients mayhave improved with treatment than withplacebo just because of chance. Statisticalmeasures of precision (or power) gauge thepotential contribution of chance to a studysresults. In general, the larger the number ofpatients included in a study, the smaller thecontribution of chance to the results.

    Including 95% confidence intervals of theoutcome measure of interest is usually thebest way of gauging the contribution ofchance to a studys results. A practical viewof confidence intervals is that they show you where you can expect the study results to beif the study were repeated. Most of the timethe results would fall somewhere betweenthe upper and lower limits of the confidenceinterval. In other words, on the basis of chancealone, the study results can be considered

    to be consistent with any result within theconfidence interval.

    The p-value is the next best measure of thepotential for random error in a study. The p-value indicates the probability that thedifference in outcomes observed betweengroups could be explained by chance alone.

    Thus a p- value of 0.04 indicates that thereis a 4% probability that the differences inoutcomes between patient groups in a studyare related to chance alone. By conventiona p-value of < 0.05 (less than 5%) is usuallyrequired for a difference to be consideredstatistically significant.

    The presence of a statistically significantassociation can also be determined byinspection of the upper and lower limits of the95% confidence intervals. If the measure ofassociation is the relative risk or odds ratio ofan outcome, for example, and the confidenceinterval includes 1, the study does not showa statistically significant difference. Thisis equivalent to stating that the p-value isgreater than 0.05.

    Relative to measures of statistical precision,95% confidence intervals are preferredover p-values. If p-values are not provided,include measures of statistical dispersion(e.g., standard deviation, standard error,interquartile range).

    Interpreting a Study

    Armed with the measure of associationand its 95% confidence interval, we arein a position to interpret a studys results.Often the temptation here is to determinemerely whether the study was positive (i.e.,showed a statistically significant associationbetween the intervention and outcome)or negative (did not show a statisticallysignificant association). In interpreting studyresults, however, four, not two, outcomes arepossible. This derives from the fact that thereare two kinds of differences you are lookingfor: whether the difference is statisticallysignificant and whether the difference isclinically important. Henceforth when we usethe term significant we meanstatisticallysignificant , and when we use the term important we meanclinically important. From these two types of differences, fourpossible outcomes can be seen:

    1. The study showed a significant andimportant difference between groups.

    For example an RCT of patients withcryptogenic stroke with PFO demonstratesthat 10% of patients who had their PFOclosed had strokes whereas 20% of patient who did not have their PFO closed hadstrokes (risk difference 10%, 95% confideintervals 5%15%). This difference isstatistically significant (the confidence

    intervals of the risk difference do notinclude 0) and clinically important (no one would argue that a finding of 10% fewerstrokes is unimportant).

    2. The study showed a significant butunimportant difference between groups.

    A separate RCT enrolling a large number patients with cryptogenic stroke with PFOdemonstrates that 10.0% of patients whohad their PFO closed had strokes whereas10.1% of patients who did not have theirPFO closed had strokes (risk difference 0.195% confidence intervals 0.05%0.015%)This difference is statistically significant barguably not clinically important (there areonly 1 in 1000 fewer strokes in the patient with PFO closure).

    3. The study showed no significant differencbetween groups, and the confidence interv was sufficiently narrow to exclude animportant difference.

    A third RCT enrolling a large number ofpatients with cryptogenic stroke with PFOdemonstrates that 5% of patients who hadtheir PFO closed had strokes whereas5% of patients who did not have their PFOclosed had strokes (risk difference 0%,95% confidence intervals -0.015%0.015%This difference is not statistically significa Additionally the 95% confidence intervalsare sufficiently narrow to allow us toconfidently exclude a clinically importanteffect of PFO closure.

    DID YOU KNOW?The Neurology journal editorial policyprohibits use of the termstatisticallysignificantin manuscript submissions.Instead authors are advised to use theterm significantto convey thisstatistical concept. For more informationon the journals editorial policy, visithttp://submit.neurology.org .

  • 8/10/2019 elaboracion de guias

    16/6013

    4. The study showed no significant differencebetween groups, but the confidenceinterval was too wide to exclude animportant difference.

    Our last hypothetical RCT of patients withcryptogenic stroke with PFO demonstratesthat 5% of patients who had their PFO

    closed had strokes whereas 5% of patients who did not have their PFO closed hadstrokes (risk difference 0%, 95% confidenceintervals -10%10%). This difference isnot statistically significant. However, the95% confidence intervals are too wide toallow us to confidently exclude a clinicallyimportant effect of PFO closure. Becauseof the lack of statistical precision, the studyis potentially consistent with an absoluteincrease or decrease in the risk of strokeof 10%. Most would agree that a10% stroke reduction is clinicallymeaningful and important.

    Let us consider these outcomes one at a time.

    Scenario 1 represents the clearly positive studyand scenario 3 the clearly negative study. A Class I study pertinent to scenario 1 or 3 would best be described as anadequately poweredClass I study.

    Scenario 2 usually results from a large study.The study has a very high degree of powerand can show even minor differences. Theminor differences may not be important.The study should be interpreted as showingno meaningful difference. A Class I studypertinent to scenario 2 would best bedescribed as an adequately poweredClass Istudy showing no important difference.

    Scenario 4 results from a small study. Thestudy is so underpowered that it is unable toshow significant differences even when theremight be important differences. It wouldbe inappropriate to interpret this study asnegative. A Class I study pertinent to scenario4 should be described as aninadequately poweredClass I study.

    To be sure, determining what is clinicallyimportant involves some judgment.Discussion among panel members will oftenresolve any uncertainty. When the clinicalimportance of an effect remains uncertain, itis best to stipulate explicitly in the guideline what you considered clinically important.

    The methodological characteristics of eachinformative study along with their resultsshould be summarized in evidence tables.See appendix 5 for a sample evidence table.

    PI FALL A common error when interpreting astudy that shows no significant differencebetween treatment groups is to failto determine whether the study hadadequate power to exclude a clinicallyimportant difference. Such a study is nottruly negativerather, it is inconclusive.It lacks the precision to exclude animportant difference.

    Synthesizing EvidenceFormulating Evidence-based Conclusions At this step multiple papers pertinent to aquestion have been analyzed and summarizedin an evidence table. These collective data

    must be synthesized into a conclusion. Thegoal at this point is to develop a succinctstatement that summarizes the evidence inanswer to the specific clinical question. Ideally,this summary statement should indicatethe magnitude of the effect and the class ofevidence on which it is based. The conclusionshould be formatted in a way that clearly linksit to the clinical question.

    Four kinds of information need to beconsidered when formulating the conclusion: The class of evidence

    The measure of association The measure of statistical precision(i.e., the random error [the power of thestudy as manifested by the width of theconfidence intervals])

    The consistency between studies

    For example, in answer to the clinical question:For patients with new-onset Bells palsy,Do oral steroids given within the first3 days of onsetImprove long-term facial outcomes?

    The conclusion may read:For patients with new-onset Bells palsy,Oral steroids given within the first 3 daysof onset of palsy Are likely safe and effective to increasethe chance of complete facial functionalrecovery (rate difference 12%) (twoinadequately powered Class I studiesand two Class II studies).

    In this example, the level of evidence on whichthe conclusion is based is indicated in two ways:

    1) the termlikely safe and effective indicatesthat the effectiveness of steroids is based onmoderately strong evidence, and 2) the numberand class of evidence on which the conclusionis based are clearly indicated in parentheses. Toavoid confusion, you should explicitly indicatein the conclusion when studies have insufficienpower to exclude a meaningful difference.

    Appendix 6 provides guidance on translatingevidence into conclusions.

    The level of certainty directly relates to thehighest class of evidence with adequatepower used to develop the conclusion. Thus,conclusion language will vary on the basisof the following levels of evidence: Multiple Class I studies:

    Arehighly likely to beeffective Multiple Class II studies or a single

    Class I study: Arelikely effective

    Multiple Class III studies or a singleClass II study

    Are possibly effective Multiple Class IV studies or a single

    Class III study:For patients with new-onset Bells palsythere is insufficient evidence to suppoor refutethe effectiveness of steroids inimproving facial functional outcomes.

    Analogous verbiage is used when studiesdemonstrate that therapy is ineffective: Multiple negative, adequately powered

    Class I studies: Arehighly likely not to beeffective Arehighly likely to be ineffective

    Multiple negative, adequately poweredClass II studies; or a single adequatelypowered Class I study:

    Arelikely not effective Arelikely ineffective

    Multiple negative, adequately poweredClass III studies; or a single adequatelypowered Class II study:

    Are possibly not effective

    Are possibly ineffective

    DID YOU KNOW? When formulating evidence-basedconclusions the AAN avoids the terms proven effective orestablished as effective. Evidence is never definitive,and therefore conclusions derivedfrom evidence cannot be provenor definitively established.

  • 8/10/2019 elaboracion de guias

    17/6014

    Multiple Class IV studies, a single adequatelypowered Class III study; or negative,inadequately powered Class I, II, or III studies:

    For patients with new-onset Bells palsy,there is insufficient evidence to supportor refutethe effectiveness of steroids inimproving facial functional outcomes.

    Please see appendix 6 for a tool to help youconstruct conclusions.

    Accounting forConicting Evidence When all of the studies demonstrate the sameresult, are of the same class, and are consistent with one another, developing the conclusionis a straightforward matter.

    Often, however, this is not the case. Thefollowing provides guidance on how toaddress inconsistent study results.

    Consider a hypothetical example where thesearch strategy identified one Class I study,one Class II study, and one Class III study onthe effectiveness of steroids in Bells palsy.The Class I study shows a significant andimportant difference from placebo. TheClass II and III studies show no significantor important difference from placebo. Whatshould the author panel do? One approach would be to treat each study like a vote.Because the majority of studies (2/3) show nobenefit, the panel could conclude that steroidshave no effect. This vote-counting approach is

    not acceptable; it ignores the sources of error within each study.

    The appropriate approach to take whenfaced with inconsistent results in theincluded studies is to attempt to explain theinconsistencies. The inconsistencies can oftenbe explained by systematic or random error.

    Considering Bias First: Basing the Conclusion on the Studieswith the Lowest Risk of BiasThe authors should consider systematic error

    first. In this example, the differences in riskof bias among the studies likely explain theinconsistencies in the results. The Class I studyhas a lower risk of bias than the Class II orClass III studies. Thus, the results of the Class Istudy are more likely to be closer to the truth.The Class II and III studies should be discounted,and, if possible, the conclusion formulatedshould be based solely on the Class I study.

    The conclusion would be worded:Oral steroids arelikelyeffective to

    (The likely effective conclusion is supported when there is asingle Class I study used toformulate the recommendation. If we changedthis example slightly and includedtwo ormorepositive Class I