toward creating a gold standard of drug indications from fda drug labels

26
Ritu Khare 1 , Jiao Li 2 , Zhiyong Lu 1 1 National Center for Biotechnology Information (NCBI), U. S. National Library of Medicine, NIH 2 Institute of Medical Information, Chinese Academy of Medical Sciences Toward Creating a Gold Standard of Drug Indications from FDA Drug Labels

Upload: ritu-khare

Post on 07-May-2015

89 views

Category:

Health & Medicine


1 download

TRANSCRIPT

Page 1: Toward Creating a gold Standard of Drug Indications from FDA Drug Labels

Ritu Khare1, Jiao Li2, Zhiyong Lu1 1National Center for Biotechnology Information (NCBI), U. S. National Library of Medicine, NIH

2Institute of Medical Information, Chinese Academy of Medical Sciences

Toward Creating a Gold Standard of Drug Indications

from FDA Drug Labels

Page 2: Toward Creating a gold Standard of Drug Indications from FDA Drug Labels

Presentation Order

2

1. Motivation

2. Materials and Methods

3. Results

4. Discussion

Page 3: Toward Creating a gold Standard of Drug Indications from FDA Drug Labels

Drug Disease Treatment Relationships

Which drug(s) are approved for

treating which diseases(s)

Most frequently sought

information among clinicians (Ely et al. 2000)

Among top 10 most frequent

multi-concept queries on

PubMed (Dogan et al. 2009)

Applications

Google Knowledge Graph

(quick referencing)

Training biomedical systems (Lu et al. 2013, Li and Lu 2012)

Controlling errors in EMRs (Khare et al. 2013)

3

Disease 1

Disease 2

Disease 3

Disease

Drug Indications (e.g. What are the indicated uses of

Fluoxetine capsule)

Disease Treatments (e.g. What are the prescribed drugs

for hypertension)

Page 4: Toward Creating a gold Standard of Drug Indications from FDA Drug Labels

Gold Standard Properties

1. Factual

2. Structured and Normalized

3. Specific to a dose-form e.g. ear drop, oral tablet, topical gel,…

Ketorolac injection and Ketorolac ophthalmic solution have different indications

Existing resources

- DrugBank (University of Alberta), MedicineNet (WebMD), DailyMed (National Library of Medicine

/ FDA)

- Factual, Specific (Not Structured)

- NDF-RT (U.S. Dept of Veteran Affairs), Freebase (Google)

- Factual, Structured (Not Specific)

4

Disease 1

Disease 2

Disease 3

Drug 1

Drug 2

Drug 3

RXCUI

RXCUI

RXCUI

UMLS CUI

UMLS CUI

UMLS CUI

Page 5: Toward Creating a gold Standard of Drug Indications from FDA Drug Labels

DailyMed: The Drug Indication Data Store

Drug Database of the National Library of Medicine (NLM)

Most recent drug labels (or packet inserts) submitted to FDA by various

pharmaceutical companies.

5

Factual

Structured

Dose Form Specific

X

Page 6: Toward Creating a gold Standard of Drug Indications from FDA Drug Labels

Identify Indications from Drug Labels

The Challenges

6

Drug Indication Excerpts in DailyMed

d1

Dutasteride capsules are indicated for the treatment of symptomatic benign

prostatic hyperplasia.

Dutasteride is not approved for the prevention of prostate cancer.

d2 Ranitidine is indicated in the treatment of GERD. Concomitant antacids

should be given for pain relief to patients with GERD.

d3 In patients with coronary heart disease, but with multiple risk factors for

coronary heart disease such as retinopathy, albuminuria, smoking, or

hypertension.

contraindication

other drug’s

risk factors

Page 7: Toward Creating a gold Standard of Drug Indications from FDA Drug Labels

Related Studies with DailyMed Indications

1. Neveol and Lu (2010)

SemRep (tool for identifying relationships)

1,263 ingredients

73% accuracy

2. Wei et al. (2013)

Use SIDER2 (based on DailyMed)

1,554 ingredients

67% accuracy

3. Fung et al. (2013)

MetaMap (biomedical concept recognizer)

2,105 Drugs (ingredient + dose form)

77% accuracy (on 295 drugs)

7

- Biomedical Text

Mining Tools (65-

80% accuracy)

- Expert Annotation

Page 8: Toward Creating a gold Standard of Drug Indications from FDA Drug Labels

Presentation Order

8

1. Motivation

2. Materials and Methods

3. Results

4. Discussion

Page 9: Toward Creating a gold Standard of Drug Indications from FDA Drug Labels

DailyMed: Dataset

Downloaded: August 24 2012 version

Multiple drug labels for same drug by different

manufacturers.

Clustered drug labels using RxNorm identifiers

Determined a representative drug label for each

cluster

Frequently sought drugs

303 ingredients are most frequently sought (80%

access) on PubMed Health (query logs 2010-2011)

Top Drugs: Clonazepam(Klonopin), Acetaminophen

(Tylenol), Azithromycin (Zithromax), …

9

18,353 human prescription drug labels

2,497 unique drug labels

(based on RxNorm identifiers)

504 frequent drug labels

100 drug labels

(randomly selected)

Page 10: Toward Creating a gold Standard of Drug Indications from FDA Drug Labels

Mining Indications from Drug Labels

1. Automatically identify the candidate indications from drug

labels

2. Display the drug labels with pre-computed candidate

indications, i.e., preannotations (Neveol et al. 2011) on an

annotation interface

3. Two expert annotators accept/reject the preannotations

Educational background: medical and library sciences

Training: Biomedical Literature Indexing

10

Page 11: Toward Creating a gold Standard of Drug Indications from FDA Drug Labels

Method:

Identify Disease Mention from Drug Labels

UMLS-based disease lexicon

Seed concepts: UMLS CUIs

Vocabulary: MeSH, SNOMED-CT

Semantic Types: 12 types belonging to “Disorder” semantic group

Terms:

Removed: acronyms, abbreviations, fully specified names, and stop words.

Included: all English language non-suppressed synonyms, and their normalized

strings (NLM’s normalization tool NORM)

Extracting disease mentions from Drug Labels

Tokenized, lengths 1 -6.

All tokens and their normalized versions matched with lexicon terms

Overlapping mentions, e.g. “arthritis” and “rheumatoid arthritis,” resolved

by choosing the more specific (longer) match

11

Page 12: Toward Creating a gold Standard of Drug Indications from FDA Drug Labels

Annotation Interface

12

Page 13: Toward Creating a gold Standard of Drug Indications from FDA Drug Labels

Annotation Workflow: Two Rounds

Round 1

Pre-annotations = All Disease Mentions

A1 and A2 independently perform annotations

Round 2

Pre-annotations (color-coded) = (i) exclusive judgments (ii) pre-annotations from round-1 not

selected by either

A1 and A2 independently improve previous annotations.

13

Page 14: Toward Creating a gold Standard of Drug Indications from FDA Drug Labels

Annotation Workflow Sets and Guidelines

Sets Preliminary Guidelines

100 Drug Labels 50 drug labels at a time

Set-1 (avg. 126 words/drug label)

Set 2 (avg. 249 words/ drug label)

Annotation Order

Set-1 round-1 Error analysis/Update Guideline

Set-1 round-2 Error analysis/Update Guideline

Set-2 round-1 Error analysis/Update Guideline

Set-2 round-2 Error analysis/Update Guideline

14

Page 15: Toward Creating a gold Standard of Drug Indications from FDA Drug Labels

Preliminary Annotation Guidelines Examples

What to Annotate

Select all types of indications (treatment, relief, prevent,…)

What NOT to Annotate

Do not select medical procedures

15

Page 16: Toward Creating a gold Standard of Drug Indications from FDA Drug Labels

Evaluation

Ground Truth Evaluation

Ground truth for the 100-

drug label dataset.

Three study investigators Reviewed drug labels

Derived the indicated usages and

the UMLS concepts.

Consulted NDF-RT and PubMed

Health

Total 461 ground truth

indications

1. Pre-annotation Performance

Precision, Recall

2. Annotator Performance

Common judgments (Both

annotators agree)

Joint performance Recall, Precision, F1-measure

Inter-annotator Agreement (Jaccard)

(num_match/num_match+num_

nonmatch)

16

Page 17: Toward Creating a gold Standard of Drug Indications from FDA Drug Labels

Presentation Order

17

1. Motivation

2. Materials and Methods

3. Results

4. Discussion

Page 18: Toward Creating a gold Standard of Drug Indications from FDA Drug Labels

Pre-annotation Quality 850 Pre-annotations (UMLS-CUIs) for 100 drug labels

Precision Recall 51.88%

Remaining disease mentions:

Contraindications

This drug should not be used for treating

type I diabetes.

Part of organization’s name

The Advisory Council for the Elimination

of Tuberculosis, the American Thoracic

Society, …

Characteristics of an indication

A major depressive episode implies a

prominent and relatively persistent

depressed or dysphoric mood that usually

interferes with daily functioning

Symptoms, Organism names, Risk factors of

95.67%

Missed cases

natural language challenges

Identifying “skin infections” from “skin and

soft tissue infections”

limitations of the lexicon

the concepts “tick fever” and “pylori

infection” were not included.

18

Page 19: Toward Creating a gold Standard of Drug Indications from FDA Drug Labels

Judgment(Expert Annotation) Assessment

Number of Drug Labels and Duration

Joint Performance

A nearly perfect joint precision

Avg. 7.5% improvement in F1-measure

improved from round-1 to round-2.

Inter-annotator agreement

Set-1: 76.2%

Set-2: 93.9%

19

Round-1

#Drug Labels

Round-2

# Drug Labels

Avg. Total

Time

/Annotator

Set-1 50 22 124 min

Set-2 50 28 173 min

Page 20: Toward Creating a gold Standard of Drug Indications from FDA Drug Labels

Error Analysis Set 1

Set 2

Missed Indications

Alprazolam is also indicated for the

treatment of panic disorder, with or

without agoraphobia

Incorrect Judgments

Selecting Symptoms

Panic disorder is characterized by

following symptoms: palpitations,

pounding heart, or accelerated

heart rate …

Selecting Indications of other Drugs

Cimetidine hydrochloride injection is

indicated for the short term treatment of

active duodenal ulcer. Concomitant

antacids should be given as needed for

relief of pain.

Missed Indications

Drug labels were long upto 800

words

Incorrect judgments

Selecting species names

Respiratory tract infections caused by

Streptococcus pneumoniae

Selecting conditions (e.g. sedation)

caused by the drug.

20

Page 21: Toward Creating a gold Standard of Drug Indications from FDA Drug Labels

Updated Guidelines

What NOT to Annotate What To Annotate 1. Contraindications

2. Indicated Usages of Another Drug

3. Disease mentions part of an

organization’s name

4. Explicitly specified symptoms

5. Species or organism names

6. Medical procedures

7. Characteristics of an indication

8. Risk factors

1. All indicated usages

2. All types of indications (treat,

prevent, manage, relief…)

3. Main and associated indications

4. Indication treated by a combination

of drugs

5. Efficacy established in clinical trials

21

Special Cases of Annotation

1. Causing Disease

2. Optional Indication

3. In patients with a disease

Page 22: Toward Creating a gold Standard of Drug Indications from FDA Drug Labels

Updated Guidelines Special Cases (Need Domain Knowledge)

1. Causing Indication

Hydroxyzine Hydrochloride: Useful in the management of pruritus due to

allergic conditions such as chronic urticaria and atopic and contact

dermatoses

Diclofenac Epolamine: Flector Patch is indicated for the topical treatment of

acute pain due to minor strains, sprains, and contusions

2. Optional Indication

Fluoxetine Hydrochloride: Acute treatment of Panic Disorder, with or without agoraphobia, in

adult patients

Alprazolam: Alprazolam is also indicated for the treatment of panic disorder, with or without

agoraphobia.

3. In patients /adults with a Disease

Azithromycin : Azithromycin tablet is indicated for the prevention of disseminated Mycobacterium

avium complex (MAC) disease in persons with advanced HIV infection.

KEPRA: KEPPRA XR™ is indicated as adjunctive therapy in the treatment of partial onset

seizures in patients ≥16 years of age with epilepsy.

22

Page 23: Toward Creating a gold Standard of Drug Indications from FDA Drug Labels

Presentation Order

23

1. Motivation

2. Materials and Methods

3. Results

4. Discussion

Page 24: Toward Creating a gold Standard of Drug Indications from FDA Drug Labels

Conclusions Semi-automatic method (NLP + Annotation by two experts)

Toward factual, structured, specific gold standard

A promising performance, joint judgments as gold

Avg. 3 min/drug label by each annotator

F1-measure = avg. 0.95

First study involving annotation of drug indications

Specific and detailed indication annotation guidelines.

What to Annotate, What Not to …, Special Cases

Challenges

About half disease mentions (pre-annotations) not indications

Long textual drug labels

Special Cases of Annotation

24

Page 25: Toward Creating a gold Standard of Drug Indications from FDA Drug Labels

Limitations and Future Work

Framework

Pre-process drug labels for

improved presentation and

summarization

Algorithm for preparing

pre-annotations needs

sophisticated text mining

techniques (e.g. MetaMap,

NegEx)

Evaluation

Different pair(s) of annotators

Compare gold standard with

existing resources/studies

Classification ability of annotated

corpus

Current Status

534 unique drug labels curated (~

7,688 drug labels)

272 Frequently Sought

Ingredients

25

Page 26: Toward Creating a gold Standard of Drug Indications from FDA Drug Labels

Acknowledgments

Grant Intramural Research Program of the NIH, National Library of

Medicine

Two Human Annotators

Zanmei Li Yujing Ji

Biomedical Text Mining Group at NCBI

Robert Leaman Yuqing Mao Chih-Hsuan Wei

26